VMware – vSAN Deploy and Manage course – Day 3

Today it was the last day in our VMware vSAN Deploy and Manage course. Nevertheless today we have given everything again. We had a deep dive in designing vSAN solutions, we discussed the key topics in design decisions and also played around with some what-if scenarios. But as every day we kicked off with some review what we discovered yesterday, and again to make for everyone clear what vSAN really is.

Day 3

Daily review

What is vSAN:

  • Software Defined Storage
  • Hyper Converged Infrastructure
  • Network Storage Topology
  • Hypervisor integrated
    • That means less latency
    • no dependencies on VMs
    • support
    • distributed
  • local disks presenting one datastore per cluster

Use cases:

  • VDI (licensing, offload of IOPS, scalable)
  • Test / DEv environments (projects, easy, growth)
  • Branch Office / Remote Office (same solution, backup)

Install vSAN:

  • Simple, with GUI, all from the web client (with just few clicks)
  • install vSphere
  • create a Cluster
  • set a VMkernel for vSAN
  • disable HA
  • claim disks
    • create disk groups
    • claim them as cache / capacity tier
  • enable vSAN

What’s in the default vSAN policy:

  • FTT = 1
  • Stripes = 1
  • No reservation (neither cache nor capacity)
  • Thin provisioning

What is a Fault Domain:

  • an area which may can fail
  • plan to recover impact of Ops
    • Rack awareness
    • Site awerness

Availability:

In vSAN there are two states of compliance…

  • Compliant
  • non compliant
    • Absent => wait for 60 minutes, then rebuild
    • Degraded => rebuild immediately

What-if failures:

  • Cache disk fails => lose disk group => latency increases
  • Capacity disk fails => degraded => rebuild => VM back online
  • Controller => host issue => HA response
  • Host outtage (complete loss of host) => HA response => VM response

Module 7 Lesson 2 – Troubleshooting

Some topics we covered already yesterday. Today it was also some repetition and a quick overview about troubleshooting and some of the tools we discovered yesterday. There are so many tools for troubleshooting available, either already built-in or community driven, i think the list could be longer. But at least some of the most known tools i will provide you with this list.

  • vCenter (you don’t say…)
  • vROPS
  • esxtop
  • Wireshark (yes indeed; capture packets on ESXi and analyze them with Wireshark => pcap)
  • vSAN Observer (based on Ruby)
  • RVC (Ruby vSphere Console)
  • Health Service Plugin
  • vCheck
  • PowerCLI scripts (combined with Onyx)

A cool tool indeed is vCheck. It’s based on Powershell scripts and runs against your vSphere infrastructure (there are scripts for other stuff too). You schedule the scripts and you can reveive notifications about changes, issues (before they become a real deal). So when you arrive in your office you already know what’s going on (or what’s not). Also worth to mention is vSAN Observer. It’s already there, just start it and access the built-in webserver to get an overview what’s going on in your vSAN environment.

Module 8 -Stretched Cluster

After doing some work in the labs we talked about design. And having a stretched cluster is also a question of design, how to create a solution which covers rack outtages or even a complete site outtage. You can do that with a stretched cluster. And the failover happens automatic (what may probably not the best solution in every fail over situation…).

When planning a stretched cluster you have to concern about resources. You need 50% spare capacity on both sites (talking about two racks or two sites) in HA admission control. Imagine that one site / rack should keep the other one online, and the stuff which is already running on the secondary site too.

You don’t have to use SRM (Site Recovery Manager) for a failover. vSAN does that for you automatically. If you use SRM then you have to have a recovery plan for each and every VM. Thats a lot of planning and even checks if there are new or changed VMs. Not to think about the costs. You need SRM licenses and a second vCenter license.

Talking about the vSAN witness. A witness is a separate ESXi box. This can be a physical server with ESXi which needs to be licensed. This physical server can’t be a member of a cluster, but it can run some VMs on it. Or you can get a witness appliance, which represents a special ESXi as an appliance, which runs on a ESXi server. This appliance cannot run VMs on it.

You can have a ROBO vSAN cluster in your remote office / branch office which consists only of two ESXi hosts in this cluster. If you’re doing so you have to have a witness host / appliance in your main office site. You always need somewhere a witness to have the quorum in case of an HA event. And remember the 5 heartbeats. In the case of an outtage, after 5 missed hearbeats your host is gone and a failover happens.

Module 10 – Designing a vSAN deployment

That’s not a random list of IT buzzwords, folks. You have to consider these key points when you’re designing a vSAN solution (probably any other scalable solution too).

Availability Management
Managability Virtual machines
Performance Compute
Recoverability Network
Security Storage

Let me give you some more things to consider. In the way of designing a vSAN solution you will have to find answers to these questions. Some answers you will get from your customer when talking with him about a solution for his specific needs. Some other answers you will find when you design the solution. And you will find some more questions too…

Requirements (must have / be / do)

  • “RPO of 15 minutes”
  • “RTO of 5 minutes”
  • Location of data / data center

Constraints (design decisions)

  • “Must work with existing network hardware”
  • “Must work at this site”

Assumptions

  • “We have enough bandwith”

Risks

  • “If the bandwith is not enough => risk of not meeting the SLA”

If you covered the topics above (and the bullet points are just ideas, there are lot more to cover) then you will proceed with the design.

Conception

Logical

  • “Keep data at this location”
  • “We want two sites, one for failover”
  • Should also be vendor independent

Physical

  • Here you can come in with the vendor and create the solution
  • vSAN here, stretched cluster there
  • Network links from here to here
  • Backup then with this
  • and so on…

And if you are searching some benchmark tools for your newly created solution, there you go:

  • HCIBench (https://labs.vmware.com/flings/hcibench)
    • simplify and accelerate customer POC performance testing
    • not only a benchmark tool designed for Virtual SAN
    • evaluate the performance of all kinds of Hyper-Converged Infrastructure Storage in vSphere
  • HammerDB (http://www.hammerdb.com/)
    • open source database load testing and benchmarking tool
    • Oracle Database, Microsoft SQL Server, IBM DB2, TimesTen, MySQL, MariaDB,  PostgreSQL
    • Postgres Plus Advanced Server, Greenplum, Redis
    • Amazon Aurora and Redshift and Trafodion SQL on Hadoop

Here you can find the other blog posts about the vSAN deploy and manage course: