Recovering from Missing VMDK Pointer Files
One of my customers recently had a fairly unique issue. It was unique enough that I thought that it
would make an interesting write up, just in case anyone sees something similar
in their own environment. We haven’t
been able to track down root cause, but what ended up happening is all of the
pointer VMDK files on a specific LUN were deleted. The –flat.vmdk files were still there, but
none of the <VM>.vmdk files were present.
This meant that no VM on that LUN could power on, as they were all
throwing an error “File <unspecified filename> was not found”. That error message is slightly better than
“An unspecified error occurred”, but only just barely.
This customer is still on ESX 4.0 and, as you’d guess, they’re
pretty far out of date as far as patches go.
In addition, one of the fibre ports on their switch was flapping,
causing some intermittent storage access issues. Neither of those facts sounds like a cause of
the issue that we came across to me, but we didn’t need to know root cause in
order to get those machines up and running again… and the reason I’m writing
this is to explain how I did it. But
first, here’s a brief description of the relationship between a Virtual Machine
and its hard drive files.
As we all know, a Virtual Machine’s virtual hardware is
specified in that Virtual Machine’s .VMX file.
That file is a simple text file – if you’re at an ESX Host’s console
(either via SSH or locally), you can edit that file with vi; otherwise, you may
download it through the vSphere Client and edit it in Notepad, then re-upload
it. Regardless of how you get at it, it
specifies exactly which .VMDK files should be attached to the VM (as well as if
they are SCSI or IDE and where they fall on their respective chain). For example, a Windows XP VM with an IDE
harddrive will have the following pair of lines:
Ide0:0.present = “TRUE”
Ide0:0.fileName = “WinXP.vmdk”
Those two lines tell the Virtual Machine to look for a file
named “WinXP.vmdk” and to connect that to the VM through the virtual IDE
controller as device 0. If you look in
your VM’s directory (SSH is probably the best way to do this), you’ll notice 2
files for the VM’s hard drive. There
will be the <VM Name>.vmdk file and the <VM Name>-flat.vmdk file. If you use “ls –lh” (a long listing with file
sizes in human friendly units), you’ll notice that the –flat.vmdk file is quite
large, whereas the .vmdk file is quite small.
This is because the .vmdk file is a pointer file with meta-data about
the –flat.vmdk file. This pointer file
is also a simple text file and may be manipulated by the same means as the .vmx
file. The –flat.vmdk file, on the other
hand, is the hard drive’s binary file in which all data for that drive is
encoded. Don’t open that file in a text
editor. Just don’t.
When you power on a virtual machine, the system reads that
VM’s .vmx file and locks all of the specified .vmdk files (unless you’re doing
some advanced stuff that we’ll ignore for now).
If the system can’t find one of those .vmdk files (or can’t get a lock
on one of them), that power on action fails.
So, how did we correct this issue?
As you can imagine, we simply recreated the .vmdk pointer files. This process involves a lot of modification
at the command line – it’s moderately simple, but please be careful. Also, bear in mind that this is a record of
how I solved this particular problem – anything that you do in your
environment, you do at your own risk!
Creating a .vmdk pointer file from scratch is hard. It’s got all sorts of numbers that you’d need
to understand and be able to calculate (such as the size of the hard drive, in
512 byte blocks). On the other hand,
modifying an existing .vmdk file is really easy. Remember how I asked you to do an “ls –lh”
earlier to compare the sizes of the –flat.vmdk vs. the .vmdk? Well, do that again, and this time take note
of the size of the –flat.vmdk. Now, add
a new hard drive to your VM of that same size.
If this process fails with an error saying that it “Cannot complete the
operation because the file or folder [LUN] <VM>/<VM>.vmdk already
exists”, just use the following command to create a dummy file and try again
(replacing <LUN> and <VM> with appropriate values for your
environment):
touch /vmfs/volumes/<LUN>/<VM>/<VM>.vmdk
Now, you should have a <VM>_1.vmdk file and its
corresponding <VM>_1-flat.vmdk file (assuming that there was only one
hard drive for the VM on this LUN originally).
This is a new, empty hard drive of the same size as your original hard
drive. Now, you copy the pointer file
from this new hard drive, but give it the name of the old hard drive (if you’ve
got that dummy file sitting around, go ahead and overwrite it):
cp <vm>_1.vmdk <vm>.vmdk
Then you edit the <vm>.vmdk file in your favorite text
editor. In that text editor, you’ll see
a line like the following:
RW 83886080 VMFS “WinXP_1-flat.vmdk”
Unless you’ve got a 40 GB hard drive, your number will be
different (that number is 40 GB in 512 byte blocks). The name that’s in quotes is the binary flat file
that this pointer file references. You
can probably guess at the next step – just change that line to point to the
original hard drive file, leaving the rest of it the same:
RW 83886080 VMFS “WinXP-flat.vmdk”
Save your modified .vmdk file and remove (deleting from
disk!) that extra blank hard drive from the VM.
And that’s it, it should power on without any problems.
This comment has been removed by a blog administrator.
ReplyDelete