Replacing the PSC and/or vCenter in an Enhanced Linked Mode vSphere 6 Environment

One of my customers is planning on upgrading to vSphere 6.  They have many sites and require that each site be able to operate independently, but want them to be centrally managed under normal conditions.  So, they have a great use case for Enhanced Linked Mode!

We're exploring different architectures and how to handle various situations.  One of the situations that we wanted to explore very thoroughly was the loss of a vCenter or PSC at one of the sites and how the recovery process might work.  To that end, we've set up three VCSA + PSC pairs in a single SSO domain and have been ripping them apart and generally abusing them.

We've come up with a generally good procedure that covers, start to finish, how to decommission a PSC all the way to how to restore a replacement PSC/vCenter to full functionality.  We've come up with this process by combining the steps from a few VMware KB articles, and so I've decided to go ahead and cover it here in a single workflow.

At a high level, the process starts by removing the failed object (either the PSC or the vCenter server), and all references to the failed object from the remaining nodes in the environment.  We then move on to redeploying the failed system.  In the case of redeploying the PSC, we had a lot of trouble repointing the existing vCenter at the replacement PSC and so we elected to instead just backup the vCenter database, delete the vCenter VM, deploy a fresh vCenter VM and restore the backup into the new VM.  If you have additional solutions in place, you may need to work with those solutions in order to reattach them to the replaced vCenter, depending on the solution.  In this environment, we've created JSON answer files so that we can deploy VCSA or PSC servers from a command line, so actually building one of both of those systems is trivial.  If you need to create such a file yourself, look at the templates on the vCenter installation media in the vcsa-cli-installer\templates\install folder and modify the appropriate one with your information.
  1. Determine the PSC replication Chain
    1. SSH to the PSC and use this command to display all PSCs in the environment
      • /usr/lib/vmware-vmdir/bin/vdcrepadmin -f showservers -h localhost -u Administrator -w <PASSWORD>
    2. Use this command from an SSH session on each PSC to see what other PSC(s) it is connected to:
      • /usr/lib/vmware-vmdir/bin/vdcrepadmin -f showpartners -h localhost -u Administrator -w <PASSWORD>
  2. Decommission PSC(s) by working "down the chain" so that no PSC is ever left isolated
    1. Unregister the nodes from their neighbors using this command within an SSH session:
      • Cmsso-util unregister --node-pnid <FQDN> --username Administrator@vsphere.local --passwd <PASSWORD>
    2. If that command fails to complete (such as, if it hangs), force the cleanup:
      • /usr/lib/vmware-vmdir/bin/vdcleavefed -h <FQDN> -u Administrator -w <PASSWORD>
  3. Deploy the replacement PSC
    1. On your desktop, extract the vCenter ISO
    2. Copy the PSC deployment answer JSON files from the network
    3. Browse to the VCSA-CLI-Installer\Win32 folder and deploy it with the PSC answer file via this command:
      • .\vcsa-deploy.exe --accept-eula <path to answer JSON>
  4. Backup vPostgres Database from affected vCenter(s)
    1. Follow instructions in vmware KB 2091961 to use the backup_lin.py script
  5. Redeploy vCenter to the restored PSC
    1. Extract the vCenter ISO
    2. Copy the VCSA deployment answer JSON files from the network
    3. Browse to the VCSA-CLI-Installer\Win32 folder and deploy it with the vCenter answer file via this command:
      • .\vcsa-deploy.exe --accept-eula <path to answer JSON>
      • Note: if this step fails with ERROR_TOO_MANY_NAMES, KB 2106736 describes how to use the below command to remove the old vCenter inventory object:
    4. cmsso-util unregister --node-pnid vCenterServer_System_Name --username administrator@vsphere.local --passwd vCenter_Single_Sign_On_password
  6. Reconfigure the PSC as needed (such as joining it to the domain).
  7. Restore vPostgres Database on the affected vCenter(s)
    1. Follow instructions in vmware KB 2091961 to use the restore_lin.py script
      • At the end, make sure that you start vmware-vdcs and vmware-vpxd, not just restart vpxd as the instructions say.


Comments

  1. I am getting this from the command on step 1:
    unknown command: '/usr/lib/vmware-vmdir/bin/vdcrepadmin'
    On the PSC. I even see the vdcrepadmin file. Its weird.

    ReplyDelete

Post a Comment

Sorry guys, I've been getting a lot of spam recently, so I've had to turn on comment moderation. I'll do my best to moderate them swiftly after they're submitted,

Popular posts from this blog

PowerShell Sorting by Multiple Columns

Clone a Standard vSwitch from one ESXi Host to Another

Deleting Orphaned (AKA Zombie) VMDK Files