PowerShell String Manipulation of Formatted Text in Columns

Every now and then, I find myself needing to use a utility like plink in order to interface with a system, such as a switch or a chassis, during a script.  If I'm just sending configuration commands (and am taking it on faith that they worked...), then it's nice and easy, but if I actually want to extract information from the device, then I've got a bit of a challenge, because those devices (via plink) are not going to give me back an object that PowerShell understands.

For example, if I use get-vm in PowerShell, I will get back a vm object that has a bunch of properties, which I can easily access using dot notation.  If I use plink to pull a brocade switch configuration, all I'm going to get back (from PowerShell's perspective) is a great big long string with lots of New Line characters, tabs and spaces.  So, how do I extract data from a formatted text string, in order to more easily work with it in PowerShell?  Well, there's a lot of different tricks available, but here's some that I've used recently.

First, you need to understand the format of the string that you want to parse.  Does it use tabs to create the impression of columns?  Maybe each line is a separate piece of data using a colon to delineate what the property is from the actual value.  Maybe it's something else entirely!  The approach that you use will depend entirely on what that data looks like, but I generally like to create a new PowerShell object with appropriate properties that I can then work with.

Today, I'm going to talk about how I deal with false columns.  Well, this is the worst, as you also have to understand how it deals with data that is too wide for the column as well as how it spaces the columns.  Does it just misalign all further columns, or does it "wrap" to the next line instead?  My general approach is to first split the string by the New Line character, so that I can take it line by line (if it's not already neatly split) like this:

$a = $a -split("`n")

Then I look at the columns to figure out the spacing.  Is it something simple, like a tab delimited list?  If so, I can then split each line on the tab character in order to work with the data.  Check out this example:

$a = "Name`tDate`tMood`nJason`tToday`tHappy"

That will create some sample formatted text that looks like this:

Name    Date    Mood
Jason   Today   Happy

In this case, because we're dealing with tabs between the columns, we can easily convert this to a PowerShell object like this:

$a | convertFrom-csv -delimiter "`t"

Which will turn it into a nice object, complete with Name, Date and Mood parameters.  What do you do if it uses spaces to create the columns?  Well, then you've got a couple of options.  Let's look at this example formatted text:

$a = "Name   Date   Mood`nJason  Today  Happy"

which generates output like this (this example might get a little messed up because of the HTML conversion and the spaces):

Name   Date   Mood
Jason  Today  Happy

In this case, I think that the best approach is to insert delimiting characters to each line, then use that same convertFrom-csv cmdlet to turn it into an object.  Try this:

$b = $a -split("`n") | % {$_.insert(14,"`t").insert(7,"`t").replace(" ","")}
$b | convertfrom-csv -Delimiter "`t"

The first command there will split the input into an array based on the new line character, then will insert tabs into each line at the 14th and 7th positions.  Finally, it deletes any spaces that it found, so that your object parameters won't have troublesome spaces in their names.  The next command simply interprets that newly formatted text as a tab delimited csv and then creates a PowerShell object from it.

So, what do you do if there's line wrapping in the table?  Well, that's even more difficult.  I'd start by using one of the above techniques in order to generate a delimited csv, then I'd start parsing through that CSV to correct for the unneeded line breaks.  The first thing to determine is how to detect if there's been a wrap.  One way might be if one or more fields is blank in a given column, as that could indicate that this line should be considered a continuation of the line before.  Here's how I'd correct for that.

First, let's prepare an example:
$a = "Name`tDate`tMood`nJason`tToday`tHappy`nColeman`t`t`nJeff`tYesterday`tExuberant"
$b = $a | convertfrom-csv -delimiter "`t"

In this case, "Coleman" is really a continuation of data in the "name" property from the line above, although it has been presented to PowerShell as if it were a new item in the array.

Name    Date      Mood
----    ----      ----
Jason   Today     Happy
Coleman
Jeff    Yesterday Exuberant

How do we detect this and handle this situation?  Here's a solution that I put together that assumes that any line with an empty parameter must be a continuation of the previous line.  In some situations, that could be a big assumption, so make sure that is valid with your dataset before using this technique!  If that's the case, you'll need to figure out some other way of detecting a continuation, but can still use this framework for concatenating those lines once they're identified.

$properties = ($b[0] | Get-Member -membertype properties).name
for ($i=0;$i -lt $b.count;$i++){
    if (($properties | % {$b[$i].$_ -eq ""}) -contains $TRUE){
        $properties | % {if ($b[$i].$_){$b[$i - 1].$_ = $b[$i - 1].$_ + " " + $b[$i].$_}}
        $b[$i] = $b[$i - 1]
    }
}
$b = $b | get-unique -AsString

Obviously, this is a bit more complex.  What's it do?  The first line gathers a list of all of the properties from these PowerShell objects, which we'll use later to evaluate the contents of each line.

Next, it enters a for loop, going through each object in the array.  I used this technique instead of a ForEach because I'm going to be manipulating objects based on their index location within the loop.

Next, there's a complex If statement.  The evaluation that's happening there looks at all of the properties discovered in the first step and checks the current object to see if any of them are empty strings (depending on the source data, you may need to check for $NULL or something else entirely!).  If any of them are empty strings (that's what the -contains $TRUE is checking), it adds that object's property's contents to the previous object's property in the array.  After that, it overwrites the current object with duplicate data from that previous object.  The final command removes any duplicate lines from the array.

Is that a comprehensive list of techniques for parsing formatted text output?  Of course not!  But, hopefully these techniques will come in handy and save someone some headaches.

Comments

Popular posts from this blog

PowerShell Sorting by Multiple Columns

Clone a Standard vSwitch from one ESXi Host to Another

Deleting Orphaned (AKA Zombie) VMDK Files