„Starting drivers, please wait“ – An HPE adventure story

Yes i know that, the title of this blog post doesn’t sound very well. But if you read this blog post you can see that it fits very well. Recently, just some days before the writing of this post, one of my customers called me because of a vCenter issue. He’s got two HPE ProLiant DL380 Gen9 and an MSA 2040 SAS Dual Controller storage system.

The customer told me that some VM’s aren’t running anymore and that both hosts aren’t available in vCenter management. I made a quick look through remote support and saw that. Both hosts were gone, most of the VM’s still running but some not. We first tried to access the hosts through SSH but no success. We tried to access the DCUI with moderate success. At least we were able to logon, but the DCUI didn’t respond after successful login. That’s kind of strange, didn’t see that until yet. The hosts did respond to ping, so that’s at least a little light at the end of the tunnel.

We then decided to restart one host. We don’t have SSH (PuTTY) access to the hosts, we can’t manage them from vCenter, we can’t use the DCUI. What else could happen?

So we restarted a host. And this is the beginning of this story…

HPE

Read more…„Starting drivers, please wait“ – An HPE adventure story

NetApp – Change disk in a storage shelf (helpful tipps)

NetApp

This post is a special one. It’s my first post about a storage specific topic. But as many other post it is also the result of a solution for a specific problem i had recently at a customer.

I’m not very familiar with storage, especially with those from NetApp. I know what they have in their portfolio (at least some of it). But for example how to setup such storage devices correctly you’ll catch me on the wrong foot. But anyway. If it’s setup then there is mostly only the daily business which includes also troubleshooting from time to time. And i had recently a small troubleshooting issue. I received mails from a storage controller at a customer that there is something wrong with the filesystem because a disk is broken. Well, good to receive such mails in the middle of the night from Saturday to Sunday…

I had to call the NetApp support because i wasn’t sure if NetApp AutoSupport worked correctly. Before the phone call i tested it via the OnCommand System Manager, and AutoSupport worked fine. So then i called NetApp support to ask if there is already a support case for the affected system. But there wasn’t. About 5 minutes later i had a newly created support case, a support case number, and the confirmation that the replacement disk is processed to be shipped the same day.

The disk arrived the next and i had to replace it. But this time the NetApp FAS mocked me. It didn’t show which disk is defective and thus should be replaced. There should a LED light up in orange, but it didn’t.

Let me show you how i solved this LED specific problem, and how i did the whole replacement process. I know, the NetApp experts among us will probably cry. But i’m not the storage pro, i did it step by step with some help of my good old friend Google 😉

1) Check Auto-Assign

Auto-Assign is a good feature as i think. If you have unowned disks on a stack, loop, or shelf, you can configure Data ONTAP to automatically assign disk ownership at the stack or shelf level. So let’s check if it’s enabled or not.

  1. Open PuTTy (or an SSH client you like) and connect to the affected storage controller.
  2. Login as root.
  3. Enter „options disk“ to check if Auto-Assign is ON or OFF.
  4. You should see something similar to this
  5. If Auto-Assign is OFF you can enable it with „options disk.auto_assign on„.
  6. If you like you can check if Auto-Assign now really is on if you enter „options disk“ again.

2) Light up the LED of the broken disk

  1. I assume you’re still connected with your storage controller.
  2. In the next step we enable the diagnostics mode. We need that to activate the LED.
  3. Enter „priv set diag“ to enable diagnostics mode.
  4. Now we need to find out which disk is defective. Enter „aggr status -f“ to find out. You should see something similar to this:
  5. Now let’s light up the LED! Enter „led_on Disk_name“ (=> „led_on 0a.00.23“ in this example) to get the light on.
  6. Now you should see the orange LED light showing up. Now you know which disk is broken and want’s to be replaced.

3) Replace the broken disk and assign it as spare

  1. I assume you’re still connected with your storage controller. One last thing is to do.
  2. After you replaced your disk, check if the disk is recognized by the controller and probably owned on a stack, loop or shelf.
  3. Enter „disk show -n“ to get the needed information.
  4. You should see something similar to this:
  5. DISK          OWNER           POOL    SERIAL NUMBER          HOME
    --------      -------------   -----   -------------          ------
    2c.01.13      Not Owned       NONE    3QQ2xxxxxxxxxxxxBQT4
  6. You now know that there is a disk waiting for duty.
  7. Enter „disk assign 2c.01.13“ (where 2c.01.13 should be replaced with your current disk name) to assign it.
  8. In my specific case the disk was assigned as spare, because the spare was automatically set as an active disk to replace the broken one.

That’s it. Just some minutes and all was fine.