This blog post, I call blog posts like these “quick & dirty posts”, will show you today how to remove an ESXi host permanently from your vSAN cluster. Yes. Permanently. Forever.
Usually, you’re adding more capacity to a cluster, which means adding more hosts or disks to solve that problem. However, some legitimate reasons exist to remove an ESXi host from a vSAN cluster. Maybe you’re currently in the middle of a hardware renewal. The new hardware is already installed and running in production. And now, server by server, you’re removing the old hardware because you’re on track with the workload migration. The same counts for adding a cluster with nodes that have more “meat by the bone”, more compute power, and storage capacity. Nodes that are running more energy-efficient than the old ones. You see, only two reasons, but there might be many more.
But let’s dive into this topic now.
How to remove an ESXi host from a vSAN cluster?
We’re starting with making sure that the cluster and the disk groups have enough space to have one host removed. If the cluster is fine, let’s move on to remove the host.
Place the host into maintenance mode
Right-click the host, choose “Maintenance Mode”, then “Enter Maintenance Mode”.
This blog post, I call blog posts like these “quick & dirty posts”, will show you today how to add an ESXi host to your vSAN cluster. You may need additional compute power, and/or storage capacity. Or you want to implement another storage policy to leverage storage efficiency and more failures to tolerate (RAID 5 / RAID 6 erasure coding). Maybe you want to create a stretched cluster, which needs an even number of vSAN nodes on each side plus a vSAN witness. So many reasons to add another vSAN node.
But let’s dive into this topic now.
How to add a new host to a vSAN cluster?
In my vSAN cluster, based on the Express Storage Architecture (ESA), I’ve got six hosts currently. I want to add another host to the cluster because of the computing and storage power. How to do this?
Right-click your cluster and choose “Add Hosts…”. So far, nothing special.
Just a few weeks ago, to my shame, I stumbled across an interesting feature in VMware vSphere when trying some things with vSAN. To be honest, and to make it clear before we dive into this topic in this blog post: I screwed up when I tested this feature the first time. Because I didn’t know about this feature and because I didn’t proceed as I should have, as per this feature. At the end I had to reinstall all my vSAN nodes and create a new clean environment after that I screwed up. It was somehow needed anyway because of the most recent homelab rebuild. So, somehow a win-win for me and the lab.
So don’t screw up! No, just kidding. you may know the feature better than me. And I can tell you, vSAN is stronger and more resilient than you may think.
In this blog post, I’d like to show you how to shut down a vSAN cluster, and how to start it again. The feature is hidden in plain view, right-click the vSAN cluster and you’re good to go. Or not?
In this blog post, I’m assuming that the vCenter is NOT running on the vSAN cluster. I may update this blog post, or create another one, with vCenter running on the cluster. Without searching the internet and checking the VMware docs, I don’t know by heart if this is even possible. Anyway. So how do you shut down the vSAN cluster?
I may have skipped some homelab generation upgrades in my documentation here. However I have updated the page as far as I could, and I would like to provide you with a brief update on my current homelab setup with this blog post as well.
Last year, my wife and I moved into our own house. Yes, I married my love and we built a house. I didn’t shout about it because it’s something personal and I don’t have to rub it in everyone’s face. But yes, I’m a married house owner now and a loving father. Oh, I forgot to mention that my wife gave birth to a beautiful son this year. So many things happened! But anyway, back to topic.
You may have seen some images I posted on Twitter last year, about the huge IT rack I got my hands on, and the first “production” deployment in my new homelab rack. This “production” deployment was an actual beer fridge that was small enough to fit into that rack. If you don’t believe me, please go ahead and check the pictures here. The beer fridge is still there, but the huge and heavy IT rack has gone. The huge rack has been replaced by a desktop-size rack from StarTech.com. This rack is enough to provide a nice mount for my SuperMicro servers and networking equipment.
But the main topic in this blog post is the recently acquired hardware. I bought a refurbished HP Z8 G4 workstation!
When I check my blog, I can see the last post from February 2022. That’s a long time ago already! Time to write something, isn’t it?
Back in the days when I was working as a Systems Engineer for an IT service provider, it was much easier to write blog posts. Now as a “customer” I don’t find the time or the ideas, or maybe I’m forgetting blog post ideas, not sure why. At least that’s my thought. I’m always struggling if I should blog about this or that, is it worth writing about it, or are there gazillions of blog posts writing about the exact same thing?
Today’s blog post is exactly such a topic, I assume, that has been written about already some times, at least. But it was a problem we had during an ongoing vSphere upgrade project just recently. And I was able to help our operations team to move on with their work. So why not write a blog post about it?
What happened?
As mentioned, we’re currently working on a global vSphere upgrade project. We’ve got many ESXi hosts and clusters all around the world. So far nothing special. And even when there are easy-to-understand guides available internally (I wrote these myself and triple verified), maybe one or the other point on a checklist is forgotten, or you just didn’t think of it in the heat of the moment. One point is “Check the current credentials if they are working”. Thanks to the following troubleshooting guidance, there was no show stopper and only a few minutes of delay for the upgrade of one ESXi host.
The root password for one of the ESXi hosts didn’t work. No chance to log in through the Web UI nor SSH. So what to do then?
There are only two officially supported ways to reset the root password of an ESXi host. You can reinstall the host from scratch or use host profiles. Well, reinstallation would be an option as we’re upgrading vSphere anyways. But this would require some additional time because of the ESXi configuration. Using a host profile can be done, but needs an Enterprise Plus license.
Because we have some spare licenses left for Enterprise Plus (not yet needed for hosts, but already planned to use), we decided to go the way with the host profile. And it wasn’t rocket science!
How can you do it?
The actual troubleshooting chapter is divided into two parts. The first part is changing the current license of an ESXi host, the second part is all about the host profile.
If you don’t have an Enterprise Plus license, then you have to plan reinstalling the ESXi server from scratch.
Change the host license
Log in to the vCenter WebClient (https://yourvcenter.domain.com/ui)
In vCenter, go to Home and then choose Administration and then Licenses
Click the Assets tab and then the HOSTS button
In the Asset column, you can click the filter icon and search for the ESXi host where you want to assign a different license
Select the host, then click Assign License just above the list
Choose the Enterprise Plus license, and click OK
The host will now have an Enterprise Plus license, and you can continue with the steps below.
Remember to switch back the license to the one that was assigned to the ESXi host before.
Extract, change, and apply the host profile
Log in to the vCenter WebClient (https://yourvcenter.domain.com/ui)
In vCenter, go to Home and then choose Policies and Profiles, and click Host Profiles
Click → Extract Host Profile
In the Extract Host Profile menu wizard → Select the host you want to update the password for, then click Next
Name the Host Profile and click Next and then Finish to complete the capture of the host profile template
The new host profile should appear on the Host Profile Objects Field
Right Click the new Host Profile and choose → Edit Host Profile
In the Edit Host Profile wizard, uncheck all boxes
Then using the search filter search for → root
Highlight and then select the check box for → User Configuration / root
Make sure to only select this item when searching for root
A configurable window will display the root user configuration
At the Password subsection, choose → Fixed password configuration
Here you have to fill in the new password and confirm it before proceeding
Double-check that all other non-applicable boxes have no check marks and proceed to Finish
Once the task completes, right-click the new host profile and choose → Attach/Detach Hosts and Clusters → then select the host in the wizard
Right-click the host profile again, and select Remediate
Remove/detach the host profile from the host
At this time the host password should be successfully upgraded
Please be careful. It is recommended that you do this when the host is in maintenance mode. If it is part of a cluster, great. You can move all VMs away from that host with DRS (automatically or manually). If it is a standalone host, make sure to shut down the VMs first, just in case the host reboots. During the writeup, the affected host did not reboot, but there was a checkbox in the remediation settings that could cause the host to reboot.