Using Powershell to Edit Substrings

On my 3D Printing blog, I recently realized that Blogger made a change to the way that images are displayed.  It used to show the full size image when you clicked it, but it has changed to only show a slightly larger version of the image.  That has almost no impact on this blog since I rarely post images, but I frequently write tutorials that have screenshots from Blender or PrusaSlicer... and it's pretty important that the settings in those images be legible!  Well, some google-fu revealed an easy work-around; just change the href links to point to a different subfolder, so that the full resolution image will be displayed.  That's great, except that manually editing the HTML from my blog posts and changing the file path is a highly repetitive and boring task... well, you see where this is going.  PowerShell to the rescue!

I've done a fair amount of string manipulation with PowerShell, so I thought this was going to be a trivial task... but this one had a bit of a wrinkle that I've never really dealt with before.  In this case, I wanted to find an arbitrary number of text substrings that begin with "lh3" (blogger's media server) and end in an image file extension, then add the "s9999" folder immediately before the file name, preserving the rest of the substring.  I was a bit stuck with how to get what I wanted though, so I bounced the idea off Jeff... and he gave me the answer that always brings with it a whole host of questions: regex.

I ended up working out a pretty simple regular expression that would find these image paths and allow me to manipulate them.  So, my workflow is now to write up my blog post, go to the HTML editor, copy it out and into the $post variable of this script, let the script do its magic, then paste that new HTML back in place and call it good.  The script is down below, but what's it do?

Well, after I've put my blog post HTML in the $post variable, I enter a foreach loop.  That loop is processing each string that matches this regex: 'lh3.*?\.(jpg|png|gif|jpeg)'

That regex finds all substrings that begin with lh3, have as many characters as needed (but the fewest possible), then a "." and a jpg, jpeg, png, or gif file extension.  So, with all of those file paths extracted from the file, the foreach loop does its magic to each of them.  It starts by verifying that this path hasn't already been fixed, then starts generating the correct URL.  It does this by simply splitting off the folder path from the file name, then jamming them back together with the "s9999" folder in the middle.  Then, with that $newLink created, it replaces the instance of the original link with the $newLink path.

After it's done this process for each URL detected by the regex, it spits out the new HTML so that I can copy-paste it back to blogger.  It's admittedly not an elegant solution, but it gets the job done and has given me the chance to learn to tackle a more complex string manipulation task.  So, I call that a win!  Here's the PowerShell that I used:


$post = @"

HTML GOES HERE

"@

foreach ($origLink in (($post | select-string 'lh3.*?\.(jpg|png|gif|jpeg)' -allMatches).matches).value){

if ($origLink -notmatch "\\s9999\\"){

$newLink = ($origLink | split-path -Parent) + "\s9999\" + ($origLink | split-path -Leaf)

$post = $post.replace($origLink,$newLink)

}

}

$post

Comments

Popular posts from this blog

Clone a Standard vSwitch from one ESXi Host to Another

PowerShell Sorting by Multiple Columns

Deleting Orphaned (AKA Zombie) VMDK Files