Monday, August 1, 2016

VSAN - on-disk upgrade error "Failed to realign following Virtual SAN objects"

I upgraded the ESXi hosts from 6.0 GA to 6.0U2 and selected upgrade for VSAN On-disk format Version, however this failed with following error message:

"Failed to realign following Virtual SAN objects:  XXXXX, due to object locked or lack of vmdk descriptor file, which requires manual fix"


I reviews the VSAN health log files at following location:
/storage/log/vmware/vsan-health/vmware-vsan-health-service.log

Grep realigned


Grep Failed


I was aware of this issue due to previous blog posts on same problem and new of KB 2144881 which made the task of cleaning objects with missing descriptor files much easier.

I ran the script: python VsanRealign.py -l /tmp/vsanrealign.log precheck.

I however received another alert and the python script did not behave as it should with it indicating a swap file had either multiple reverences or was not found.



I then used RVC to review the object info for the UUID in question.


I used RVC again to try and purge any inaccessible swap files:
vsan.purge_inaccessible_vswp_objects ~cluster

no objects was found.

I then proceeded to review the vmx files for the problem VM in question and found reference to only the original *.vswp file and not with additional extension of *.vswp.41796

Every VM on VSAN has 3 swap files:
vmx-servername*.vswp
servername*.vswp
sername*.vswp.lck

I figured this servername*.vswp.41796 is just a leftover file and bear no reference to the VM and this is what is causing the on-disk upgrade to fail.

I proceeded to move the file to my /tmp directory  (Please be very careful with delete/moving any files within a VM folder, this is done at your own risk and if you are not sure I highly recommend you contact VMware support for assistance)

I ran the python realign script again. This time I received a prompt to perform the autofix actions to remove this same object in question for which i selected yes.



I ran the on-disk upgrade again and it succeeded.


Even though VMware provides a great python script that will in most instance help you clean up the VSAN disk groups, there are times when this will not work as planned and then you just have to a bit more troubleshooting and perhaps a phone call to GSS.


links:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144881