Automatically Restarting VDI Desktops

One of the biggest challenges associated with VDI is supporting applications.  In the server world, we've pretty thoroughly moved to a "one application per server" mentality, so application conflicts don't really happen.  If there's a problem, it's immediately obvious which application is involved.  In general, it makes management much easier.

Desktops, of course, are in the far opposite situation.  We shove as many applications into a desktop as the user needs.  This means that, when there is an issue, it can be very difficult to diagnose which application is misbehaving.  I've recently been helping a customer who has been suffering from random disconnects in their VDI environment.  The View Administrator shows the desktops with a status of either "Invalid IP" or "Agent Unreachable" and the Agent logs on the desktop haven't been particularly helpful.

Eventually, we installed Liquidware Labs Stratusphere on the desktops, in order to get a better idea about what's going on before a crash.  It turns out that, just before the desktop crashes (and goes to Invalid IP or Agent Unreachable), the memory usage in the VM is 98%+ and the page file goes absolutely ballistic.  Clearly, there's a memory leak.  Interestingly, this didn't show up in the VM performance charts in vCenter because vCenter's "Active Memory" is only looking at pages that have been recently changed.  Since a memory leak is basically stale data not being cleared out of memory, vCenter has no visibility into the fact that that memory is still tagged as "in use" by the Guest OS and so everything looks fine.

This desktop image has everything but the kitchen sink thrown into it, so we're still troubleshooting the finer details of the leak.  In the meantime, the customer needs some easy way to keep the desktops functional for the users... and so we're falling back on the classic memory leak alleviation technique: restart the thing.

It's trivial to write a script that will restart every VM in an environment; it's trivial to write a script that will restart every VM on a list.  I expect that it will take some time to actually resolve this memory leak issue, so I wanted a script that was slightly more advanced.  For that reason, I wrote this script with a bit of logic in it.

This script is built to take a regular expression for the VM names, so if you want to restart every desktop in a pool, just pass it <Pool Name>* as its "-v" parameter.  Importantly though, it doesn't just blunder through the list dumbly restarting every desktop.  It basically does one of three things, depending on the VM state.  If there is an active VDI session on the VM, the script does nothing to that desktop.  If the desktop is in a responsive state, it issues a soft restart.  If the desktop is unresponsive, it performs a hard power off and power on.  There's also a little bit of "dummy proofing" in the script, in that it is limited to only restarting VMs with a specified OS (Windows 7 by default).

In order for this to work, a bit of work must be done to prepare the environment.  First and foremost, you must install the vSphere PowerCLI on the View Connection Server on which you are going to schedule this task.  You'll also probably want to edit the $domainSuffix parameter (or just pass it an appropriate value) to be your domain (wherever the desktops reside), and the $serverName parameter to be your vCenter server.

Once this is in place, the script can be set as a Scheduled Task on the Connection Server.  It has to run on the Connection Server because it uses the View cmdlets that are installed along with the Connection Server.  When scheduling the task, remember that the program that you want to execute is "powershell -file <path to script> <-whatever options the script needs>" and you should ensure that it is running as an account with appropriate vCenter permissions (this will also allow it to run when noone is logged into the server).

As always, this script is provided for educational purposes and there are no guarantees.  While it worked for me in my situation, please ensure that you understand the script and make any necessary changes to it.  Beware unintended line breaks due to blog width.

Update: I've created a new and improved version of this Automatic Desktop Restart Script.

# Script to restart a set of VDI Desktops that may or may not be frozen.
# If the VMTools status is green, the desktop will be issued a "restart".  If the VMTools status is red, the desktop will be issued a "reset".
# Only touches VMs with a Guest OS as defined by the $guestOS parameter - by default, Windows 7.
# Built to be run from a generic PowerShell prompt on a View Connection Server that has the vSphere PowerCLI Installed.
# Set the $domainSuffix to your desktops' domain suffix - this is important for verifying that an existing VDI Session is not active on the desktop
# Author: Jason Coleman - virtuallyjason.blogspot.com
#
# Usage: restart-DT.ps1 -v <VM Name> -s <vCenter Server>

param
(
    [alias("s")]
    [string]$serverName = "vCenter",
    [alias("v")]
    [string]$vmName = "*",
    [string]$domainSuffix = "child.contoso.com",
    [string]$guestOS = "*Windows 7*"
)
#Load the VMware snapins
add-pssnapin VMware.VimAutomation.Core
add-pssnapin VMware.View.Broker
#Connect to the vCenter Server
Connect-VIServer $serverName | out-null
#Gets a list of all VMs, filtered by the entered name.
#get-view was used instead of get-VM in order to access GuestHeartbeatStatus and GuestOS

$vmList = get-view -ViewType "VirtualMachine" | where {$_.name -like $vmName}
#Gets a list of all remote sessions in the environment
$remoteSessions = Get-RemoteSession | select session,dnsname,state
foreach ($thisVM in $vmList)
{
    #Checks if the guest is a desktop OS
    if ($thisVM.config.guestfullname -like $guestOS)
    {
        #Checks if there is an active session for that VM
        if ($remoteSessions | where {$_.DNSName -eq "$($thisVM.name).$domainSuffix"})
        {
            Write-Output "$($thisVm.name) has an active session."
        }
        else
        {
            #If the HeartBeatStatus is green, use a soft restart
            if ($thisVM.GuestHeartbeatStatus -eq "green")
            {
                write-output "Soft Restart: $($thisVm.name)"
                get-vm -name $thisVM.name | Restart-VMGuest | out-null
            }
            #if the HeartBeatStatus is red, use a hard reset
            elseif ($thisVM.GuestHeartbeatStatus -eq "red")
            {
                write-output "Hard Reset: $($thisVm.name)"
                get-vm -name $thisVM.name | stop-VM -confirm:$false | out-null
                start-sleep 15
                get-vm -name $thisVM.name | start-VM | out-null
            }
        }
    }
}
Disconnect-VIServer $serverName -Confirm:$False

Comments

Popular posts from this blog

Clone a Standard vSwitch from one ESXi Host to Another

PowerShell Sorting by Multiple Columns

Deleting Orphaned (AKA Zombie) VMDK Files