Hey Checkyourlogs Fans,
Tonight @SifuSun (Cary Sun) and I were doing some routine maintenance of a 2-node Storage Spaces Direct Cluster that was having issues with checkpoints locking up during Veeam Backups. We went through the normal process putting the nodes into maintenance mode, patching, and waiting for storage jobs to complete before moving on to the next node.
We found an interesting scenario where after the first node was complete and we patched our second node look at what happened to our cluster.
The above screenshot was what we first noticed with the Cluster IP going offline. Even moving this to the good node wouldn’t help.
Then we checked the Cluster Networks and look at what we saw.
It was so weird because everything was working fine before installing the March 2019 Cumulative update. I have seen this happen before with VM’s where when they complete their updates and are waiting for reboots the network stack can get upset.
So, I decided to complete the updates and reboot node 2 to see what would happen. As it turns out immediately, our cluster didn’t show partitioned anymore and when Node 2 was online everything was back to normal.
As it turns out immediately, our cluster didn’t show partitioned anymore and when Node 2 was online everything was back to normal. You can see in the screenshot below that the network shows unavailable now and not partitioned.
Then we tried to bring the Cluster Core Resources back online.
I like easy fixes like this. I’m not sure of the cause only the solution of continuing the maintenance until complete seemed to resolve this. Definitely a weird one for us tonight.
Hope this helps you out,
Dave and Cary