Polling CPU Ready Metrics
A customer recently asked me how to easily check their environment for high CPU Ready metrics. Given that I don't know of an easy way to quickly look through the whole environment by the GUI, I decided to pull out PowerCLI and see what I could put together.
For those who don't know, CPU Ready records the time that a VM spends (during a given interval) ready for CPU time, but no physical CPU is available. Basically, it reports on CPU contention. There will always be small amounts of CPU Ready - that just reflects the time that it takes a CPU to react to an OS's request to compute. In general, CPU Ready times averaging below 5% of your polling interval should be completely unnoticeable. Between 5% and 10%, delays become noticeable, and beyond 10% they are seriously problematic.
Those rules for interpreting CPU ready are all based on percentage of a time interval, but the metric that is actually recorded is the number of milliseconds spent in a Ready state. When looking at the Real Time graphs on an ESXi Server (with a 20 second interval), the 5% threshold is 1,000 milliseconds and the 10% threshold is 2,000 milliseconds. Easy as pie. It gets a little bit more awkward when you look back at archived data, which might have something like a 2 hour polling period. Quick, what is 5% of 2 hours, in milliseconds. Yeah, exactly.
So, when I wrote this script, I built it to function off of and report percentages. The script takes a few arguments, but if you forget to use them it should have a desirable default behavior. At its most basic, if you just execute this script (after connecting your PowerCLI session to your vCenter server), it will examine all VMs in the environment and report any VM that had CPU Ready values above 5% in the past 7 days (it will report when each VM experienced the Ready time and what the percentage is, as well). The output should be formatted for easy use as a CSV - I don't actually know how to create PowerShell objects, so this seemed like the next best thing.
If you want to start using options, the three that it's really built to use are -VM, -Days and -Threshold. There are aliases for these options (-v, -d and -t), to reduce the amount of typing required to use them. -VM allows you to specify a particular VM by inventory name (or, theoretically, a wildcard that will result in multiple VMs, although I haven't tested that). If unspecified, the script assumes that you want it to examine all VMs in the inventory. -Days allows you to specify the number of days to examine - it defaults to the last 7 days if unspecified. Please note, your vCenter metric retention levels must be set so that CPU Ready is available, otherwise the script will just report on the the data that is present. -Threshold allows you to adjust the CPU Ready threshold that you are interested in examining - it defaults to 5%, but you can specify the number (no percentage sign, please) above which you are interested in receiving a report (so, specifying 0 would make it report on all data, as every VM will always have some tiny amount of CPU Ready). You may notice that the script also accepts a -Metric option - that is in place to allow future development to allow the script to analyze metrics besides CPU Ready. It's not ready for anything else yet, so don't use that option :P
As always, this script is provided as-is with no guarantees and is for educational purposes. It has worked in my testing, but that's not guarantee that it will work in your environment. If you think of any improvements (or spot any bugs!), please let me know and I'll be happy to update the script and credit your contributions. I've tried to color-code this one - let me know if that makes it more human readable or if it makes any difference at all. Also, please be wary of unintended line breaks due to blog formatting; the indenting and color coding should make it more obvious if that has occurred.
#By Jason Coleman - http://virtuallyjason.blogger.com
#This script identifies any VM that has CPU Ready metrics above a user definable threshold within a user definable period. It reports when each VM experienced the load. Only reports on Powered On VMs.
#Usage: Vm-Ready.ps1 -VM <Virtual Machine Name> -Days <# of days to attempt to read> -Threshold <% as an integer above which to report>
[CmdletBinding()]
param (
[alias("v")]
$VM = "",
[alias("d")]
[int]$Days = 7,
[alias("t")]
[int]$Threshold = 5,
[alias("m")]
[string]$metric = "cpu.ready.summation"
)
if ($VM -eq ""){
$VMs = Get-VM
}
else{
$VMs = Get-VM -name $VM
}
$start = (Get-Date).AddDays(-$Days)
foreach ($ThisVM in $VMs){
if ($ThisVM.PowerState -eq "PoweredOn"){
if ($metric -eq "cpu.ready.summation"){
foreach ($Report in $(Get-Stat -Entity $ThisVM -Stat $metric -Start $start -Erroraction "silentlycontinue")){
$ReadyPercentage = (($Report.Value/10)/$Report.IntervalSecs)
if ($ReadyPercentage -gt $Threshold){
$PerReadable = "$ReadyPercentage".substring(0,4)
write-output "$($Report.Entity), $($Report.Timestamp), $PerReadable%"
}
}
}
else{
write-output "This script does not yet accomodate $metric"
}
}
}
For those who don't know, CPU Ready records the time that a VM spends (during a given interval) ready for CPU time, but no physical CPU is available. Basically, it reports on CPU contention. There will always be small amounts of CPU Ready - that just reflects the time that it takes a CPU to react to an OS's request to compute. In general, CPU Ready times averaging below 5% of your polling interval should be completely unnoticeable. Between 5% and 10%, delays become noticeable, and beyond 10% they are seriously problematic.
Those rules for interpreting CPU ready are all based on percentage of a time interval, but the metric that is actually recorded is the number of milliseconds spent in a Ready state. When looking at the Real Time graphs on an ESXi Server (with a 20 second interval), the 5% threshold is 1,000 milliseconds and the 10% threshold is 2,000 milliseconds. Easy as pie. It gets a little bit more awkward when you look back at archived data, which might have something like a 2 hour polling period. Quick, what is 5% of 2 hours, in milliseconds. Yeah, exactly.
So, when I wrote this script, I built it to function off of and report percentages. The script takes a few arguments, but if you forget to use them it should have a desirable default behavior. At its most basic, if you just execute this script (after connecting your PowerCLI session to your vCenter server), it will examine all VMs in the environment and report any VM that had CPU Ready values above 5% in the past 7 days (it will report when each VM experienced the Ready time and what the percentage is, as well). The output should be formatted for easy use as a CSV - I don't actually know how to create PowerShell objects, so this seemed like the next best thing.
If you want to start using options, the three that it's really built to use are -VM, -Days and -Threshold. There are aliases for these options (-v, -d and -t), to reduce the amount of typing required to use them. -VM allows you to specify a particular VM by inventory name (or, theoretically, a wildcard that will result in multiple VMs, although I haven't tested that). If unspecified, the script assumes that you want it to examine all VMs in the inventory. -Days allows you to specify the number of days to examine - it defaults to the last 7 days if unspecified. Please note, your vCenter metric retention levels must be set so that CPU Ready is available, otherwise the script will just report on the the data that is present. -Threshold allows you to adjust the CPU Ready threshold that you are interested in examining - it defaults to 5%, but you can specify the number (no percentage sign, please) above which you are interested in receiving a report (so, specifying 0 would make it report on all data, as every VM will always have some tiny amount of CPU Ready). You may notice that the script also accepts a -Metric option - that is in place to allow future development to allow the script to analyze metrics besides CPU Ready. It's not ready for anything else yet, so don't use that option :P
As always, this script is provided as-is with no guarantees and is for educational purposes. It has worked in my testing, but that's not guarantee that it will work in your environment. If you think of any improvements (or spot any bugs!), please let me know and I'll be happy to update the script and credit your contributions. I've tried to color-code this one - let me know if that makes it more human readable or if it makes any difference at all. Also, please be wary of unintended line breaks due to blog formatting; the indenting and color coding should make it more obvious if that has occurred.
#By Jason Coleman - http://virtuallyjason.blogger.com
#This script identifies any VM that has CPU Ready metrics above a user definable threshold within a user definable period. It reports when each VM experienced the load. Only reports on Powered On VMs.
#Usage: Vm-Ready.ps1 -VM <Virtual Machine Name> -Days <# of days to attempt to read> -Threshold <% as an integer above which to report>
[CmdletBinding()]
param (
[alias("v")]
$VM = "",
[alias("d")]
[int]$Days = 7,
[alias("t")]
[int]$Threshold = 5,
[alias("m")]
[string]$metric = "cpu.ready.summation"
)
if ($VM -eq ""){
$VMs = Get-VM
}
else{
$VMs = Get-VM -name $VM
}
$start = (Get-Date).AddDays(-$Days)
foreach ($ThisVM in $VMs){
if ($ThisVM.PowerState -eq "PoweredOn"){
if ($metric -eq "cpu.ready.summation"){
foreach ($Report in $(Get-Stat -Entity $ThisVM -Stat $metric -Start $start -Erroraction "silentlycontinue")){
$ReadyPercentage = (($Report.Value/10)/$Report.IntervalSecs)
if ($ReadyPercentage -gt $Threshold){
$PerReadable = "$ReadyPercentage".substring(0,4)
write-output "$($Report.Entity), $($Report.Timestamp), $PerReadable%"
}
}
}
else{
write-output "This script does not yet accomodate $metric"
}
}
}
Will the script accept multiple names? I'm unsuccessful in trying, e.g.
ReplyDelete.\script.ps1 -v "unixabc","linuxabc","windows1","windows5"
I believe that should work; does it work for you with a single vm name? That parameter basically gets used by the "get-vm" cmdlet, which seems to work just fine with a comma separated list of VM names.
Deleteworks with a single vmname but any combo of multiple names fails for some reason...
DeleteGot it...remove the "string"
ReplyDelete[string]$VM = "",
to
$VM = "",
Then enter as
.\CpuReadyScript.ps1 -VM "Server1","Server2" -Days 2 -Threshold 0
or
.\CpuReadyScript.ps1 -v "Server1","Server2" -d 2 -t 0
Thanks for figuring that out - I've changed the script and see what's going on. If you create a strictly typed variable like this:
ReplyDelete[string]$test = "vm-prefix*"
and do a get-vm $test it will successfully return all VMs that match that string.
If instead, you do it like this:
[string]$test = "vm-prefix1","vm-prefix2"
the get-vm will fail, because the $test string is actually "vm-prefix1 vm-prefix2"... which doesn't match any of the VMs in the inventory.
When you remove the strict typing of that variable, it allows the "VM1","VM2","VM3" type input to become an array of strings, which get-vm understands how to work with.