When HA kicks (in) and you’re not quite ready…

Shares

vSphere HA functionality is definitely one of the top contributors to success of VMware virtualization.
But have you ever been in a situation, when one of your vSphere hosts failed (say, in the night) and vSphere HA did its job perfectly, restarting all these 50 VMs affected, then you come to work (say, “a little late”) just to find the whole gang of application owners, process controllers, incident managers etc. gathered at your desk and demanding you to provide them with the impact analysis and improvement plan before lunch today?

I think we’ve all been there 😉

You need to know at least which VMs were restarted by vSphere HA, just to fend all these people off.
Unfortunately finding these VMs in a DRS enabled cluster a few hours after the fail over action took place is (surprise!) not that easy task.

Especially when your infrastructure is on the smaller side and you don’t have any fancy tools like vCOPs (pardon, vRealize Operations, of course) to help you.

This was exactly my problem, when disaster like that happened to me for the first time.

vSphere Client is not much of assistance, by the time you got to work the default Events view is probably flooded with many things that happened after the fail over, bruteforce checking all VMs in cluster is… tedious and official KB is not quite what you need either.
I mean… “Reviewing FDM logs on master and slave hosts”? Anybody volunteers to do that?

Luckily we can retrieve the list of affected VMs with a simple PowerCLI one-liner, that might look like this:

get-cluster -name AcceptanceCluster | 
get-vm | 
get-vievent -start (get-date).addhours(-3) -finish (get-date) -type warning |
where-object {$_.FullFormattedMessage.contains("vSphere HA restarted")} | 
sort-object -property ObjectName | 
select ObjectName, CreatedTime

Okay – that’s six lines not one, but I put them like this just for “readability”, you can delete all the “CR + LF”s after pipe symbols and there you go – a canonical one-liner!

To be honest: I’m not very fond of one-liners 🙁 I know they are faster than for-each loops (because PowerShell tries to execute in parallel as much of the steps in “pipe” as possible), but most of the time I have difficulty reading (understanding) them, so I try to avoid one-liners whenever I don’t need data as soon as possible.

But hey, I’ve got these guys breathing in my neck here and now, right?

In the example above I’ve assumed you’re already connected to your vCenter and the fail-over happened in a cluster called “AcceptanceCluster”, so I just grab all the VMs from there and retrieve warning events registered during last 3 hours (take into account how late you really were at work and adjust this time window accordingly 😉  ). Finally I filter out events with description containing the dreaded “vSphere HA restarted” phrase.
In this case “ObjectName” property is just the name of VM restarted, together with the exact timestamp it is all the information I need, but you can easily extend this pipe further, to retrieve any data useful for you.

I hope you will find this post useful, as always: feel free to share and/or provide your feedback!

0 0 votes
Article Rating

Sebastian Baryło

Successfully jumping to conclusions since 2001.

You may also like...

Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

[…] probably the longest and least readable “one liner” I’ve ever committed. (and I don’t really like one liners, alright?). The thing is – the for-each loop from original script was taking ages […]

1
0
Would love your thoughts, please comment.x
()
x

FOR FREE. Download Nutanix port diagrams

Join our mailing list to receive an email with instructions on how to download 19 port diagrams in MS Visio format.

NOTE: if you do not get an email within 1h, check your SPAM filters

You have Successfully Subscribed!

Pin It on Pinterest