Recovering from Missing VMDK Pointer Files


One of my customers recently had a fairly unique issue.  It was unique enough that I thought that it would make an interesting write up, just in case anyone sees something similar in their own environment.  We haven’t been able to track down root cause, but what ended up happening is all of the pointer VMDK files on a specific LUN were deleted.  The –flat.vmdk files were still there, but none of the <VM>.vmdk files were present.  This meant that no VM on that LUN could power on, as they were all throwing an error “File <unspecified filename> was not found”.  That error message is slightly better than “An unspecified error occurred”, but only just barely.

This customer is still on ESX 4.0 and, as you’d guess, they’re pretty far out of date as far as patches go.  In addition, one of the fibre ports on their switch was flapping, causing some intermittent storage access issues.  Neither of those facts sounds like a cause of the issue that we came across to me, but we didn’t need to know root cause in order to get those machines up and running again… and the reason I’m writing this is to explain how I did it.  But first, here’s a brief description of the relationship between a Virtual Machine and its hard drive files.

As we all know, a Virtual Machine’s virtual hardware is specified in that Virtual Machine’s .VMX file.  That file is a simple text file – if you’re at an ESX Host’s console (either via SSH or locally), you can edit that file with vi; otherwise, you may download it through the vSphere Client and edit it in Notepad, then re-upload it.  Regardless of how you get at it, it specifies exactly which .VMDK files should be attached to the VM (as well as if they are SCSI or IDE and where they fall on their respective chain).  For example, a Windows XP VM with an IDE harddrive will have the following pair of lines:

Ide0:0.present = “TRUE”
Ide0:0.fileName = “WinXP.vmdk”

Those two lines tell the Virtual Machine to look for a file named “WinXP.vmdk” and to connect that to the VM through the virtual IDE controller as device 0.  If you look in your VM’s directory (SSH is probably the best way to do this), you’ll notice 2 files for the VM’s hard drive.  There will be the <VM Name>.vmdk file and the <VM Name>-flat.vmdk file.  If you use “ls –lh” (a long listing with file sizes in human friendly units), you’ll notice that the –flat.vmdk file is quite large, whereas the .vmdk file is quite small.  This is because the .vmdk file is a pointer file with meta-data about the –flat.vmdk file.  This pointer file is also a simple text file and may be manipulated by the same means as the .vmx file.  The –flat.vmdk file, on the other hand, is the hard drive’s binary file in which all data for that drive is encoded.  Don’t open that file in a text editor.  Just don’t.

When you power on a virtual machine, the system reads that VM’s .vmx file and locks all of the specified .vmdk files (unless you’re doing some advanced stuff that we’ll ignore for now).  If the system can’t find one of those .vmdk files (or can’t get a lock on one of them), that power on action fails.  So, how did we correct this issue?  As you can imagine, we simply recreated the .vmdk pointer files.  This process involves a lot of modification at the command line – it’s moderately simple, but please be careful.  Also, bear in mind that this is a record of how I solved this particular problem – anything that you do in your environment, you do at your own risk!

Creating a .vmdk pointer file from scratch is hard.  It’s got all sorts of numbers that you’d need to understand and be able to calculate (such as the size of the hard drive, in 512 byte blocks).  On the other hand, modifying an existing .vmdk file is really easy.  Remember how I asked you to do an “ls –lh” earlier to compare the sizes of the –flat.vmdk vs. the .vmdk?  Well, do that again, and this time take note of the size of the –flat.vmdk.  Now, add a new hard drive to your VM of that same size.  If this process fails with an error saying that it “Cannot complete the operation because the file or folder [LUN] <VM>/<VM>.vmdk already exists”, just use the following command to create a dummy file and try again (replacing <LUN> and <VM> with appropriate values for your environment):

touch /vmfs/volumes/<LUN>/<VM>/<VM>.vmdk

Now, you should have a <VM>_1.vmdk file and its corresponding <VM>_1-flat.vmdk file (assuming that there was only one hard drive for the VM on this LUN originally).  This is a new, empty hard drive of the same size as your original hard drive.  Now, you copy the pointer file from this new hard drive, but give it the name of the old hard drive (if you’ve got that dummy file sitting around, go ahead and overwrite it):

cp <vm>_1.vmdk <vm>.vmdk

Then you edit the <vm>.vmdk file in your favorite text editor.  In that text editor, you’ll see a line like the following:

RW 83886080 VMFS “WinXP_1-flat.vmdk”

Unless you’ve got a 40 GB hard drive, your number will be different (that number is 40 GB in 512 byte blocks).  The name that’s in quotes is the binary flat file that this pointer file references.  You can probably guess at the next step – just change that line to point to the original hard drive file, leaving the rest of it the same:

RW 83886080 VMFS “WinXP-flat.vmdk”

Save your modified .vmdk file and remove (deleting from disk!) that extra blank hard drive from the VM.  And that’s it, it should power on without any problems.

Comments

  1. This comment has been removed by a blog administrator.

    ReplyDelete

Post a Comment

Sorry guys, I've been getting a lot of spam recently, so I've had to turn on comment moderation. I'll do my best to moderate them swiftly after they're submitted,

Popular posts from this blog

Orphaned VMDK Files

Deleting Orphaned (AKA Zombie) VMDK Files

Clone a Standard vSwitch from one ESXi Host to Another