Just a few weeks ago, to my shame, I stumbled across an interesting feature in VMware vSphere when trying some things with vSAN. To be honest, and to make it clear before we dive into this topic in this blog post: I screwed up when I tested this feature the first time. Because I didn’t know about this feature and because I didn’t proceed as I should have, as per this feature. At the end I had to reinstall all my vSAN nodes and create a new clean environment after that I screwed up. It was somehow needed anyway because of the most recent homelab rebuild. So, somehow a win-win for me and the lab.
So don’t screw up! No, just kidding. you may know the feature better than me. And I can tell you, vSAN is stronger and more resilient than you may think.
In this blog post, I’d like to show you how to shut down a vSAN cluster, and how to start it again. The feature is hidden in plain view, right-click the vSAN cluster and you’re good to go. Or not?
In this blog post, I’m assuming that the vCenter is NOT running on the vSAN cluster. I may update this blog post, or create another one, with vCenter running on the cluster. Without searching the internet and checking the VMware docs, I don’t know by heart if this is even possible. Anyway. So how do you shut down the vSAN cluster?
The most important thing, before you shut down the vSAN cluster:
Make sure that you shut down all VMs on this cluster (if they are not critical and downtime is acceptable), or evacuate them to another cluster / ESXi host(s). And by evacuating the VMs we’re talking about a full evacuation of a VM. The whole VM (memory and disk) has to be moved to a different cluster/host.
When this is done, there should be no VM left running on the vSAN cluster. You may double-check it.
A word of troubleshooting first…
Networking / IPv6
A first word of troubleshooting already before we start. I stumbled across this when writing this blog post. It might have been also the reason why I had to reinstall all hosts and recreate the environment. Because I just didn’t know. If you’re operating a vSAN cluster with ESXi 8.0 Update 1 hosts, AND if you disabled IPv6, you will end up with this problem when shutting down the vSAN cluster:
There is currently no solution, as per the VMware knowledge base article 92656. You have to enable IPv6, reboot the hosts, and continue shutting down the vSAN cluster (if you already initiated the process). And yes, that’s possible and no problem. Put one host into maintenance mode, and restart it. One after another.
How do we shut down the vSAN cluster?
There are two ways to initiate the shutdown of a vSAN cluster, if we’re using this “Shutdown cluster” feature. I’m not talking about anything else, like putting all hosts into maintenance mode manually, without any kind of data evacuation, etc.
Going one or the other way
Through “Hosts & Clusters” view
The first way is in the Hosts & Cluster view in vCenter. Right-click your vSAN cluster, look at the bottom of the context menu, and choose “vSAN > Shutdown cluster”.
Through vSAN Services
The second way is by clicking your vSAN cluster, then clicking the tab “Configure > vSAN > Services”. In the top-right corner
What happens now?
No matter where you start the cluster shutdown, the process shows up with an assistant, at least a kind of assistant. Some pre-checks are done before you can proceed to the next step. If all VMs have been powered off (clean shut down of VMs always recommended!), then all looks good and green:
If the pre-check detects running / powered-on VMs, it will show a warning:
Oh, and I’m glad you’re asking. No, the “Next” button is greyed out, you can’t proceed. You have to shut down the running VMs first or evacuate them to another cluster so that the vSAN cluster is either empty or has just powered off VMs on it.
When you’re all set, you can proceed to the next step, confirming the shutdown. The process will automatically…:
- …power off all system VMs on that cluster (e.g., the vCLS VMs that are used for the vSphere Cluster Services)
- …turn off High Availability on this cluster
- …disable the cluster member updates
- …pause any state changes of vSAN objects
- …place all hosts into maintenance mode WITHOUT data migration
- …power off each host at the end
Please be aware that powering off all system VMs may have an impact on services that are externally used, like when you’re using the vSAN file service!
You may type in a reason for the cluster shutdown, or you can choose something from the list. Then, click the red “SHUTDOWN”. The shutdown process will now start.
There is no way back, no cancellation. If you click “SHUTDOWN”, it will do exactly this:
This process will take a while, depending on the cluster size, server resources, etc.
Remember that all hosts will be shut down too, not only the vSAN cluster services, but the
(virtual/nested) physical hosts as well. Now it’s quiet in the server room. Or at least a little quieter. Possibly.
How do we start the vSAN cluster again?
You start with powering on all
(virtual/nested) physical hosts. Wait until they are online again and connected in vCenter. They should come back online, still in maintenance mode.
As promised, the hosts are back online, but still in maintenance mode:
What’s next now? We’re going to restart the vSAN cluster. When you didn’t navigate away from the previous view (Configure > vSAN > Services) you should see the VMware service robot doing some stuff behind a rack. And yes, you’re provided with a “RESTART” button. Click it:
If all is good (not sure what will be tested here as it is just going too quickly for me), you can click the green “RESTART” button to bring your vSAN cluster back online:
The process will now basically revert all the things it did before:
This process will take a while, depending on the cluster size, server resources, etc. If all goes well, vCenter will show you a nice green message that the cluster has been successfully restarted. Awesome, isn’t it?