Hey Checkyourlogs Fans,
Tonight @SifuSun (Cary Sun) and I were doing some routine maintenance of a 2-node Storage Spaces Direct Cluster that was having issues with checkpoints locking up during Veeam Backups. We went through the normal process putting the nodes into maintenance mode, patching, and waiting for storage jobs to complete before moving on to the next node.
We found an interesting scenario where after the first node was complete and we patched our second node look at what happened to our cluster.
The above screenshot was what we first noticed with the Cluster IP going offline. Even moving this to the good node wouldn’t help.
Then we checked the Cluster Networks and look at what we saw.
It was so weird because everything was working fine before installing the March 2019 Cumulative update. I have seen this happen before with VM’s where when they complete their updates and are waiting for reboots the network stack can get upset.
So, I decided to complete the updates and reboot node 2 to see what would happen. As it turns out immediately, our cluster didn’t show partitioned anymore and when Node 2 was online everything was back to normal.
As it turns out immediately, our cluster didn’t show partitioned anymore and when Node 2 was online everything was back to normal. You can see in the screenshot below that the network shows unavailable now and not partitioned.
Then we tried to bring the Cluster Core Resources back online.
I like easy fixes like this. I’m not sure of the cause only the solution of continuing the maintenance until complete seemed to resolve this. Definitely a weird one for us tonight.
Hope this helps you out,
Dave and Cary
Exactly, the server reboot fixed the issue and all looked fine
I’ve had this issue occur random as well on Server 2019.
Solution then is to remove and add again the VM-Networkadapter on the node you think is causing the problem
Name State Metric Role
—- —– —— —-
Cluster21 Up 30081 Cluster
LAN38 Up 70241 None
LiveMigration20 Up 30080 Cluster
Mgmt00 Partitioned 70240 ClusterAndClient
remove-VMNetworkAdapter -ManagementOS -name Mgmt00
and then add it as when u set up the server
PS; Make sure u’r not connected to the server via that vNIC
I’ve got a similar case Today. After stop Kaspersky Agent on both nodes. The mgmt network bring back to green. For me its one more case that Antivirus programs is guilted.