How does vCenter Process and Relay External Commands?
I’ve been working with Unidesk lately, as well as with the vSphere PowerCLI. One of my customers has also been having an intermittent issue where they cannot browse their datastores (it just sticks at “Searching Datastore…” until they lose patience and close the window). All of this has gotten me thinking about the relationship between vCenter and the ESXi hosts. When we send a command to vCenter, such as creating a new VM, we all know that vCenter is simply relaying that command to one of its ESXi hosts, which does the actual work. Same thing with file copies through the Datastore Browser, and in fact even reading the datastores through that same tool.
How do we, as administrators, know which ESXi Host vCenter is using as its “workhorse”? In a clustered environment, it’s really very unclear. It turns out that, if that workhorse ESXi host is having an issue, it may prevent a lot of those basic vCenter functions from working (such as the aforementioned “searching datastore…” hang issue). Well, it looks like there are actually a few ways to tell what’s going on behind the scenes, and knowing this can make troubleshooting some issues a whole lot easier.
First, the difficult but definitive method. There are a lot of great logs on the vCenter server (I just outed myself as a huge geek, didn’t I?). You can find them on the vCenter server itself, under C:\ProgramData\VMware\VMware VirtualCenter\Logs (remember that ProgramData is a hidden folder). I don’t really know how to read those logs; in fact, in this regard I’m basically that guy who knows just enough Spanish to order his favorite dish at a Mexican restaurant but then tells everyone about how he is nearly fluent. That said, if you look at the latest vpxd-###.log file, even a very basic understanding can help you to gleam some useful information.
If you search through that log file, you’ll probably find a lot of instances of “ClientAdapterBase::InvokeOnSoap” – those are vCenter making SOAP (if you ever see me in person, I have a funny story to share about the first time I helped a customer with that interface) calls to an ESXi server, to process something that you told it to do. Immediately following that InvokeOnSoap bit will be a section in parenthesis – that section lists what that call did and what server it was sent to. If you go through that file, you’ll find that the overwhelming number of those SOAP calls are directed to a single ESXi host – and that’s your workhorse.
Why is it only an overwhelming number and not all of them? Well, there are some things that that ESXi host cannot do. For example, if you try to browse the local storage on an ESXi host, only that ESXi host can process that request and so vCenter will use it instead of the workhorse. For shared storage though (assuming that the workhorse has access to that shared storage), it will default to that workhorse host (which is counter-intuitive, as you can be browsing your storage through the vSphere client by selecting another ESXi host, but when you select “browse datastore” it does not proxy through that selected host, instead using its workhorse).
So, with the definitive method discussed, let’s talk about the fast, easy “hand wavey” method. For a given action, there exists a set of ESXi hosts that could complete that action. If the action is “browse datastore” on a shared volume that is visible to every host in your infrastructure, then the set is every host in your infrastructure. If the action is “create a VM” in a cluster, then the set of potential hosts is limited to only the hosts in that cluster. Anything involving local storage (or really, any object that is only accessible to a single ESXi host) will have a set of hosts that is limited to that one host.
Once you’ve figured out your set of potential ESXi hosts, let’s think about them like a computer would. You’ve got 8 potential avenues to take in order to do your work; each one is identically qualified to do said work. How do you decide? Alphabetically, of course! So, the quick and dirty method for figuring out which ESXi host is going to relay a given command is to just figure out which ESXi host, amongst those that could potentially do the action, is alphabetically strongest.
This has ramifications all over the place, ranging from simple actions like browsing datastores to processing PowerCLI commands to creating desktops in Unidesk. Hopefully, this knowledge will help if anyone is involved in a troubleshooting situation where they're seeing these sorts of symptoms.