Hey Checkyourlogs fans,
I had the pleasure over the past few weeks to work with a customer that had recently deployed Windows Server 2019 and Storage Spaces Direct. With any early deployment, we expect to hit some bumps in the road, and we found a good one this week.
Microsoft has identified a bug that relates to the SDDC Management Resource inside of Failover Clustering. Basically what happens is that this resource times out via calls from Windows Admin Center and causes the RHS process to terminate causing running Highly Available Virtual Machines in the Cluster to crash and restart on other nodes. It is a hard outage for the Virtual Machines and causes many problems as you can imagine.
To be clear the SDDC Management Resource is what Windows Admin Center uses to work with Storage Spaces Direct.
You can see what is happening here in the output from Get-Clusterlog -UseLocalTime run from one of the Storage Spaces Direct Nodes. After this further down in the log you can see the Cluster Roles (Virtual Machines) crashing and moving around and eventually restarting.
This is a different issue than what was discovered previously where tweaking the SDDC Management Resource for Windows Admin Center to run in a separate monitor would fix the issue. You would run:
(Get-ClusterResource -Name "SDDC Management").SeparateMonitor = 1
In the past this had fixed the issue.
Microsoft has now confirmed that the fix is coming next week-ish – January 20th to 21st ETA. Until then they have recommended that we stop the SDDC Management Resource until it is fixed. This, in essence, will kill your Hyper Converged Storage Spaces Direct Mangement via Windows Admin Center until the hotfix is applied and the SDDC Management Resource is restarted.
Get-ClusterResource "SDDC Management" Get-ClusterResource "SDDC Management" | Stop-ClusterResource
So, for now, folks it is best to stop using Windows Admin Center with Storage Spaces Direct on Windows 2019 until next week. It hurts me to have to say this, but it is the only fix out there for this issue right now.
It is unclear at this time if the issue impacts Windows Server 2016 SDDC Management Resources.
I hope this helps save you some pain with your Storage Spaces Direct Clusters.