NVIDIA DevBox with Ubuntu 16.04 and 4.4.0-137-generic kernel randomly reboots and automatically shuts down...












0















I've recently stated using an NVIDIA DevBox that has an ASUS bios, with ther kernel version and ubuntu version mentioned above. For some reasons the machine can't really be left on overnight, as it is usual with other laptop and/or computer machines: where you can just leave it on it will lock itself after a couple of minutes and/or go into sleep mode -- and the next day once you move your mouse or type something in your keyboard the computer 'unsuspends' or wakes up and you have all your programs on and running just how you left them the previous day.



For some strange reason, this hasn't been happening with this machine. There was a previous user before me who hasn't touched the machine in about a year, so it is possible that he/she might have done some sort of configuration with regards to power savings, but everything looks good when I check the power option in my machine (I have it for suspend -- 1 hour, and lock 1 hour). I guess the funny thing I've noticed is that if I come back after lunch and the machine is locked/suspended, it get's back in the session without any problems, but if I leave it overnight, then I arrive the next day and the machine has automatically turned itself off. The building is locked so it's not possible for someone else to physically hit the power off button overnight, and I've also checked the history command from the other user (we both have admin privileges, and he doesn't use the computer) to check for remote access shutdowns and that doesn't pop up either.



I've read in a couple of places that it could potentially be a heating issue due to poor or broken power supply, but how can I check that this is the case? I have the psensor app, but that only seems to register temperatures in real-time without saving them to a file where I can check what the temperature was of any of the graphics cards (there are 4) or motherboard.



What is another way to diagnose the automatic shutdown of the machine?
How can I know if it's a heating issue or a faulty power supply? Or potentially a kernel issue? The machine has no real intense programs installed for now (its almost new) except for the NVIDIA drivers that I'm quite experienced with installing, so maybe I can consider a fresh Ubuntu install? -- though this is pretty much pointless if there is a hardware issue



Other details:



The NVIDIA drivers are correctly installed.
The driver got bugged and the machine responded pretty badly when I forced the following command and the machine was on for 2 consecutive days (which should be a breeze for these machines), until it had a hard time being on for more than 5 minutes after 2 consecutive random reboots in the middle of the night:



$ unset autologoff


I had to reinstall the drivers later correctly (and set the autolog option back on), and the system went back to its current state where it "needs" to shut itself off if its not doing anything for more than 24 hours (not doing anything as in it is not receiving human input, but backend processes may potentially still be running).




  • Motherboard: ASUS EATX DDR4 LGA 2011-3 Motherboards X99-E WS/USB 3.1

  • CPU: Intel Xeon E5-2690 v4 2.6 GHz 14-Core LGA 2011 Processor 135 W

  • Cooler: Corsair Hydro Series H80i v2 Extreme Performance Liquid CPU
    Cooler , Black.

  • Power Supply: EVGA SuperNOVA 1600 P2 80+ PLATINUM,
    1600W ECO Mode Fully Modular NVIDIA SLI and Crossfire Ready 10 Year
    Warranty Power Supply 220-P2-1600-X1

  • Graphics Card: 4 Titan X Pascal.


I added the pci=noaer in booting after finding out that the machine was giving me this error: https://askubuntu.com/questions/771899/pcie-bus-error-severity-corrected



Output of :



$ cat /proc/cmdline


is



BOOT_IMAGE=/boot/vmlinuz-4.4.0-137-generic.efi.signed root=UUID=569dd2ad-c5a6-4ae4-a167-f849b8f6ae9e ro quiet splash pci=noaer vt.handoff=7









share|improve this question







New contributor




Arturo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    0















    I've recently stated using an NVIDIA DevBox that has an ASUS bios, with ther kernel version and ubuntu version mentioned above. For some reasons the machine can't really be left on overnight, as it is usual with other laptop and/or computer machines: where you can just leave it on it will lock itself after a couple of minutes and/or go into sleep mode -- and the next day once you move your mouse or type something in your keyboard the computer 'unsuspends' or wakes up and you have all your programs on and running just how you left them the previous day.



    For some strange reason, this hasn't been happening with this machine. There was a previous user before me who hasn't touched the machine in about a year, so it is possible that he/she might have done some sort of configuration with regards to power savings, but everything looks good when I check the power option in my machine (I have it for suspend -- 1 hour, and lock 1 hour). I guess the funny thing I've noticed is that if I come back after lunch and the machine is locked/suspended, it get's back in the session without any problems, but if I leave it overnight, then I arrive the next day and the machine has automatically turned itself off. The building is locked so it's not possible for someone else to physically hit the power off button overnight, and I've also checked the history command from the other user (we both have admin privileges, and he doesn't use the computer) to check for remote access shutdowns and that doesn't pop up either.



    I've read in a couple of places that it could potentially be a heating issue due to poor or broken power supply, but how can I check that this is the case? I have the psensor app, but that only seems to register temperatures in real-time without saving them to a file where I can check what the temperature was of any of the graphics cards (there are 4) or motherboard.



    What is another way to diagnose the automatic shutdown of the machine?
    How can I know if it's a heating issue or a faulty power supply? Or potentially a kernel issue? The machine has no real intense programs installed for now (its almost new) except for the NVIDIA drivers that I'm quite experienced with installing, so maybe I can consider a fresh Ubuntu install? -- though this is pretty much pointless if there is a hardware issue



    Other details:



    The NVIDIA drivers are correctly installed.
    The driver got bugged and the machine responded pretty badly when I forced the following command and the machine was on for 2 consecutive days (which should be a breeze for these machines), until it had a hard time being on for more than 5 minutes after 2 consecutive random reboots in the middle of the night:



    $ unset autologoff


    I had to reinstall the drivers later correctly (and set the autolog option back on), and the system went back to its current state where it "needs" to shut itself off if its not doing anything for more than 24 hours (not doing anything as in it is not receiving human input, but backend processes may potentially still be running).




    • Motherboard: ASUS EATX DDR4 LGA 2011-3 Motherboards X99-E WS/USB 3.1

    • CPU: Intel Xeon E5-2690 v4 2.6 GHz 14-Core LGA 2011 Processor 135 W

    • Cooler: Corsair Hydro Series H80i v2 Extreme Performance Liquid CPU
      Cooler , Black.

    • Power Supply: EVGA SuperNOVA 1600 P2 80+ PLATINUM,
      1600W ECO Mode Fully Modular NVIDIA SLI and Crossfire Ready 10 Year
      Warranty Power Supply 220-P2-1600-X1

    • Graphics Card: 4 Titan X Pascal.


    I added the pci=noaer in booting after finding out that the machine was giving me this error: https://askubuntu.com/questions/771899/pcie-bus-error-severity-corrected



    Output of :



    $ cat /proc/cmdline


    is



    BOOT_IMAGE=/boot/vmlinuz-4.4.0-137-generic.efi.signed root=UUID=569dd2ad-c5a6-4ae4-a167-f849b8f6ae9e ro quiet splash pci=noaer vt.handoff=7









    share|improve this question







    New contributor




    Arturo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      0












      0








      0








      I've recently stated using an NVIDIA DevBox that has an ASUS bios, with ther kernel version and ubuntu version mentioned above. For some reasons the machine can't really be left on overnight, as it is usual with other laptop and/or computer machines: where you can just leave it on it will lock itself after a couple of minutes and/or go into sleep mode -- and the next day once you move your mouse or type something in your keyboard the computer 'unsuspends' or wakes up and you have all your programs on and running just how you left them the previous day.



      For some strange reason, this hasn't been happening with this machine. There was a previous user before me who hasn't touched the machine in about a year, so it is possible that he/she might have done some sort of configuration with regards to power savings, but everything looks good when I check the power option in my machine (I have it for suspend -- 1 hour, and lock 1 hour). I guess the funny thing I've noticed is that if I come back after lunch and the machine is locked/suspended, it get's back in the session without any problems, but if I leave it overnight, then I arrive the next day and the machine has automatically turned itself off. The building is locked so it's not possible for someone else to physically hit the power off button overnight, and I've also checked the history command from the other user (we both have admin privileges, and he doesn't use the computer) to check for remote access shutdowns and that doesn't pop up either.



      I've read in a couple of places that it could potentially be a heating issue due to poor or broken power supply, but how can I check that this is the case? I have the psensor app, but that only seems to register temperatures in real-time without saving them to a file where I can check what the temperature was of any of the graphics cards (there are 4) or motherboard.



      What is another way to diagnose the automatic shutdown of the machine?
      How can I know if it's a heating issue or a faulty power supply? Or potentially a kernel issue? The machine has no real intense programs installed for now (its almost new) except for the NVIDIA drivers that I'm quite experienced with installing, so maybe I can consider a fresh Ubuntu install? -- though this is pretty much pointless if there is a hardware issue



      Other details:



      The NVIDIA drivers are correctly installed.
      The driver got bugged and the machine responded pretty badly when I forced the following command and the machine was on for 2 consecutive days (which should be a breeze for these machines), until it had a hard time being on for more than 5 minutes after 2 consecutive random reboots in the middle of the night:



      $ unset autologoff


      I had to reinstall the drivers later correctly (and set the autolog option back on), and the system went back to its current state where it "needs" to shut itself off if its not doing anything for more than 24 hours (not doing anything as in it is not receiving human input, but backend processes may potentially still be running).




      • Motherboard: ASUS EATX DDR4 LGA 2011-3 Motherboards X99-E WS/USB 3.1

      • CPU: Intel Xeon E5-2690 v4 2.6 GHz 14-Core LGA 2011 Processor 135 W

      • Cooler: Corsair Hydro Series H80i v2 Extreme Performance Liquid CPU
        Cooler , Black.

      • Power Supply: EVGA SuperNOVA 1600 P2 80+ PLATINUM,
        1600W ECO Mode Fully Modular NVIDIA SLI and Crossfire Ready 10 Year
        Warranty Power Supply 220-P2-1600-X1

      • Graphics Card: 4 Titan X Pascal.


      I added the pci=noaer in booting after finding out that the machine was giving me this error: https://askubuntu.com/questions/771899/pcie-bus-error-severity-corrected



      Output of :



      $ cat /proc/cmdline


      is



      BOOT_IMAGE=/boot/vmlinuz-4.4.0-137-generic.efi.signed root=UUID=569dd2ad-c5a6-4ae4-a167-f849b8f6ae9e ro quiet splash pci=noaer vt.handoff=7









      share|improve this question







      New contributor




      Arturo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      I've recently stated using an NVIDIA DevBox that has an ASUS bios, with ther kernel version and ubuntu version mentioned above. For some reasons the machine can't really be left on overnight, as it is usual with other laptop and/or computer machines: where you can just leave it on it will lock itself after a couple of minutes and/or go into sleep mode -- and the next day once you move your mouse or type something in your keyboard the computer 'unsuspends' or wakes up and you have all your programs on and running just how you left them the previous day.



      For some strange reason, this hasn't been happening with this machine. There was a previous user before me who hasn't touched the machine in about a year, so it is possible that he/she might have done some sort of configuration with regards to power savings, but everything looks good when I check the power option in my machine (I have it for suspend -- 1 hour, and lock 1 hour). I guess the funny thing I've noticed is that if I come back after lunch and the machine is locked/suspended, it get's back in the session without any problems, but if I leave it overnight, then I arrive the next day and the machine has automatically turned itself off. The building is locked so it's not possible for someone else to physically hit the power off button overnight, and I've also checked the history command from the other user (we both have admin privileges, and he doesn't use the computer) to check for remote access shutdowns and that doesn't pop up either.



      I've read in a couple of places that it could potentially be a heating issue due to poor or broken power supply, but how can I check that this is the case? I have the psensor app, but that only seems to register temperatures in real-time without saving them to a file where I can check what the temperature was of any of the graphics cards (there are 4) or motherboard.



      What is another way to diagnose the automatic shutdown of the machine?
      How can I know if it's a heating issue or a faulty power supply? Or potentially a kernel issue? The machine has no real intense programs installed for now (its almost new) except for the NVIDIA drivers that I'm quite experienced with installing, so maybe I can consider a fresh Ubuntu install? -- though this is pretty much pointless if there is a hardware issue



      Other details:



      The NVIDIA drivers are correctly installed.
      The driver got bugged and the machine responded pretty badly when I forced the following command and the machine was on for 2 consecutive days (which should be a breeze for these machines), until it had a hard time being on for more than 5 minutes after 2 consecutive random reboots in the middle of the night:



      $ unset autologoff


      I had to reinstall the drivers later correctly (and set the autolog option back on), and the system went back to its current state where it "needs" to shut itself off if its not doing anything for more than 24 hours (not doing anything as in it is not receiving human input, but backend processes may potentially still be running).




      • Motherboard: ASUS EATX DDR4 LGA 2011-3 Motherboards X99-E WS/USB 3.1

      • CPU: Intel Xeon E5-2690 v4 2.6 GHz 14-Core LGA 2011 Processor 135 W

      • Cooler: Corsair Hydro Series H80i v2 Extreme Performance Liquid CPU
        Cooler , Black.

      • Power Supply: EVGA SuperNOVA 1600 P2 80+ PLATINUM,
        1600W ECO Mode Fully Modular NVIDIA SLI and Crossfire Ready 10 Year
        Warranty Power Supply 220-P2-1600-X1

      • Graphics Card: 4 Titan X Pascal.


      I added the pci=noaer in booting after finding out that the machine was giving me this error: https://askubuntu.com/questions/771899/pcie-bus-error-severity-corrected



      Output of :



      $ cat /proc/cmdline


      is



      BOOT_IMAGE=/boot/vmlinuz-4.4.0-137-generic.efi.signed root=UUID=569dd2ad-c5a6-4ae4-a167-f849b8f6ae9e ro quiet splash pci=noaer vt.handoff=7






      power-management reboot pci






      share|improve this question







      New contributor




      Arturo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Arturo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Arturo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 hours ago









      ArturoArturo

      1011




      1011




      New contributor




      Arturo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Arturo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Arturo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          0






          active

          oldest

          votes











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "106"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Arturo is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f501743%2fnvidia-devbox-with-ubuntu-16-04-and-4-4-0-137-generic-kernel-randomly-reboots-an%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Arturo is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Arturo is a new contributor. Be nice, and check out our Code of Conduct.













          Arturo is a new contributor. Be nice, and check out our Code of Conduct.












          Arturo is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f501743%2fnvidia-devbox-with-ubuntu-16-04-and-4-4-0-137-generic-kernel-randomly-reboots-an%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          CARDNET

          Boot-repair Failure: Unable to locate package grub-common:i386

          濃尾地震