Invalid Configuration for Device '0' Error after Server Crash

One of my customers recently had a major issue; due to a fault in their cooling system, their server room overheated and many devices turned themselves off in order to avoid damage... including one of their SANs.  After resolving the cooling issue, the environment largely came back up without serious incident, however there was one lingering problem.

Several VMs in the environment could not connect to their distributed virtual switch.  Each VM was associated with its port group correctly, however the "connected" checkbox was cleared.  When we tried enabling that option, the task failed with an error that read: Invalid Configuration for Device '0'

Some quick googling revealed a VMware KB Article about this particular issue, with several different workarounds suggested.  They ranged from easy (move the VM to a different Port Group and back) to arduous and time consuming (remove the NIC from the VM, then add a new NIC and reassign its network identity to the new NIC).  We found that the easy options were generally not working; the option that the customer had settled on was NIC removal and re-adding.

In the middle of that KB article, it mentions another option - use the CLI on the ESXi host to force a VMX reload.  This customer has quite a few ESXi hosts though, and so this option also struck me as being more labor intensive than it needed to be.  Once again, I hit Google and found a way to do that same VM reload from PowerCLI, which connects to vCenter and will allow you to target every needed VM from a single interface.

That article discusses finding machines that were in a disconnected state; that wasn't the symptom that we'd experienced.  Instead, we had a list of VMs that were not pinging even though they were powered up, so I reduced his PowerCLI line to the following (for a server named "Server-Name"):

(get-view -viewtype VirtualMachine | ?{$ -like "Server-Name"}).reload()

While doing so, I played around with it a little bit and learned something interesting.  This command works because the VM object that is returned by the Get-View command has a method called .reload() which forces vCenter to reload the VM's vmx file.  The VM object that is returned by the normal Get-VM command does not have this method, so if you're ever working with a similar situation, make sure that you use the "Get-View -viewtype VirtualMachine" technique.

Once we ran that command on each VM, we were able to reconnect their network adapters successfully and services were restored.


Popular posts from this blog

Deleting Orphaned (AKA Zombie) VMDK Files

Clone a Standard vSwitch from one ESXi Host to Another

vCenter Server Appliance Crash due to Full /Storage/SEAT Partition