This week I was asked by a client of mine to check into why a VM kept mysteriously rebooting.
Here are some of my notes about this case à It actually turned into a nice little blog post.
I have obviously redacted the customer information for privacy reasons.
#1 –> The App Server had a schedule task to reboot every 2nd Thursday at 1:00 AM
Note: You have since disabled this schedule task and it hasn’t been in effect since 04/20/2017
#2 –> Windows updates are coming down via a Cloud Patching Solution on 05/06/2017 –> The updates were installed on 05/04/2017 and it looks like A Cloud Patching Solution Rebooted the system on 05/06/2017 at 5:17 AM
(I kept the name of the Cloud Patching Solution anonymous for privacy reasons)
We can see the it was NT Authority\System that shutdown the system due to operating system upgrades (Patches).
The Event ID 1074 can be used to trace the planned shutdown events.
So, it appears that in my opinion the Cloud Patching Solution Policy has applied to this server and that is the cause of the expected reboots.
Also, the System Time isn’t a direct Correlation to uptime in Windows.
Inside of Windows you can type in a CMD Prompt:
Now if we check Hyper-V Manager it is actually lying to us.
Based on this it appears that this box has only been up and running for a grand total of 29 minutes. This is not the case I would trust the System Info. If the VM is Live Migrated around the Hyper-V Cluster it will reset this counter in Hyper-V.
On Node A you can see that I have some of my build VM’s that aren’t part of the cluster on the same storage and they are showing 3,4,5 + days of uptime.
Recommendation: I would recommend that you remove these servers from the Cloud Patching Solution and we manually patch them for the time being. This will tell us quite clearly what the issue is.
FYI –> When a system shuts down unexpectedly. It throws and event ID of 1076 in the System Log. What we look for in 1076 is if there is a BugCheck String. This indicates that the unexpected shutdown was due to something like a Blue Screen.
As you can see I have filtered the Event Log for unexpected Shutdowns 1076. The last time it happened because of this was on April 14th and I believe this was when the network was still flapping badly and we were still cleaning things up.
Since we have stabilized things we haven’t seen these.
Once again indicating that it is a system job that is the culprit here.
So back to the Cloud Patching Solution again.
Remove it from the Cloud Patching Solution and then we see if it keeps happening. I think once removed we find it is much more stable.
Hope you enjoy this post.