NOTICE: NEW BLOGGING SITE
I have made the decision to sunset my google blogger webpage and have moved over to Wordpress. I know shocking :)
I will not be posting any new blogs on this site but please check out my new site where I will delivering more content in the future.
www.johannstander.com
Monday, August 22, 2016
Sunday, August 21, 2016
VSAN : Migrate VSAN cluster from vSS to vDS
How to migrate a VSAN
cluster from vSS to vDS
I am
sure there some of you that is currently a VSAN cluster in some shape or form
either in POC, Development or Production environment. It provides a cost effective solution that is
great for remote offices or even management clusters and can be implemented and
managed very easily but as the saying goes nothing ever come easy and you have
to work for it. The same goes here and
there are a lot of prerequisites for a VSAN environment that is crucial for implementing
a healthy system that performs to its full potential. I will not go into much detail here and feel
free to contact us if any services are required.
One
of the recommendations for VSAN is to use a vDS and your VSAN license actually
includes the ability to use vDS which allows you as our customer to take
advantage of simplified network management regardless of the underlying vSphere
edition.
If
you upgrade from vSS to vDS the steps are a bit different that your normal migration. I recommend you put the host into maintenance
mode with ensure accessibility. Verify
the uplink used for VSAN VMkernel and use the manage physical network adapter
to remove the vmnic from vSS and to add it to vDS. Now migrate the VMkernel to
the VDS. If you review the VSAN health
the network will show failed.
To
verify multicast network traffic is flowing from you host use the following
command on the ESXi host using bash shell:
#tcpdump-uw -i vmk2 -n -s0 -t -c 20 udp port 23451 or ump
port 12345
To
review your multicast network settings
#esxcli vsan network list
Ruby
vSphere Console (RVC) is also a great tool to have in your arsenal for managing
VSAN and following command can be used to review the VSAN state:
vsan.check_state <cluster>
To
re-establish the network connection you can use the following command:
vsan.reapply_vsan_vmknic_config <host>
Rerun
the VSAN health test and verify Network shows passed.
Now
that the VSAN network is up and running you can migrate the rest of VMkernels.
Monday, August 1, 2016
VSAN - on-disk upgrade error "Failed to realign following Virtual SAN objects"
I upgraded the ESXi hosts from 6.0 GA to 6.0U2 and selected upgrade for VSAN On-disk format Version, however this failed with following error message:
"Failed to realign following Virtual SAN objects: XXXXX, due to object locked or lack of vmdk descriptor file, which requires manual fix"
I reviews the VSAN health log files at following location:
/storage/log/vmware/vsan-health/vmware-vsan-health-service.log
Grep realigned
Grep Failed
I was aware of this issue due to previous blog posts on same problem and new of KB 2144881 which made the task of cleaning objects with missing descriptor files much easier.
I ran the script: python VsanRealign.py -l /tmp/vsanrealign.log precheck.
I however received another alert and the python script did not behave as it should with it indicating a swap file had either multiple reverences or was not found.
I then used RVC to review the object info for the UUID in question.
I used RVC again to try and purge any inaccessible swap files:
vsan.purge_inaccessible_vswp_objects ~cluster
no objects was found.
I then proceeded to review the vmx files for the problem VM in question and found reference to only the original *.vswp file and not with additional extension of *.vswp.41796
Every VM on VSAN has 3 swap files:
vmx-servername*.vswp
servername*.vswp
sername*.vswp.lck
I figured this servername*.vswp.41796 is just a leftover file and bear no reference to the VM and this is what is causing the on-disk upgrade to fail.
I proceeded to move the file to my /tmp directory (Please be very careful with delete/moving any files within a VM folder, this is done at your own risk and if you are not sure I highly recommend you contact VMware support for assistance)
I ran the python realign script again. This time I received a prompt to perform the autofix actions to remove this same object in question for which i selected yes.
I ran the on-disk upgrade again and it succeeded.
Even though VMware provides a great python script that will in most instance help you clean up the VSAN disk groups, there are times when this will not work as planned and then you just have to a bit more troubleshooting and perhaps a phone call to GSS.
links:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144881
"Failed to realign following Virtual SAN objects: XXXXX, due to object locked or lack of vmdk descriptor file, which requires manual fix"
I reviews the VSAN health log files at following location:
/storage/log/vmware/vsan-health/vmware-vsan-health-service.log
Grep realigned
Grep Failed
I was aware of this issue due to previous blog posts on same problem and new of KB 2144881 which made the task of cleaning objects with missing descriptor files much easier.
I ran the script: python VsanRealign.py -l /tmp/vsanrealign.log precheck.
I however received another alert and the python script did not behave as it should with it indicating a swap file had either multiple reverences or was not found.
I then used RVC to review the object info for the UUID in question.
I used RVC again to try and purge any inaccessible swap files:
vsan.purge_inaccessible_vswp_objects ~cluster
no objects was found.
I then proceeded to review the vmx files for the problem VM in question and found reference to only the original *.vswp file and not with additional extension of *.vswp.41796
Every VM on VSAN has 3 swap files:
vmx-servername*.vswp
servername*.vswp
sername*.vswp.lck
I figured this servername*.vswp.41796 is just a leftover file and bear no reference to the VM and this is what is causing the on-disk upgrade to fail.
I proceeded to move the file to my /tmp directory (Please be very careful with delete/moving any files within a VM folder, this is done at your own risk and if you are not sure I highly recommend you contact VMware support for assistance)
I ran the python realign script again. This time I received a prompt to perform the autofix actions to remove this same object in question for which i selected yes.
I ran the on-disk upgrade again and it succeeded.
Even though VMware provides a great python script that will in most instance help you clean up the VSAN disk groups, there are times when this will not work as planned and then you just have to a bit more troubleshooting and perhaps a phone call to GSS.
links:
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144881
VSAN - cache disk unavailable when creating disk group on Dell
I ran into an issue at customer where the SSD which is to be used as the cache disk on the VSAN disk group was showing up as regular HDD. However when I reviewed the storage device the disk is visible and is marked as flash...weird. So what is going on here.
As I found out this due to a flash device being used with a controller that does not support JBOD.
To fix this I had to create a RAID 0 virtual disk for the SSD. If you have a Dell controller this means you have to set the mode to RAID but make sure that all your regular HDDs to be used in the disk group is set to non-raid! Once host is back online you have to go and mark the SSD drive as flash. This is the little "F" icon in the disk devices.
This environment was configured with all the necessary VSAN prerequisites for Dell in place, you can review this on the following blog post:
http://virtualrealization.blogspot.com/2016/07/vsan-and-dell-poweredge-servers.html
Steps to setup RAID-0 on SSD through lifecycle controller:
After ESXi host is online again then you have to change the Disk to flash. This is due to RAID abstracting away most of the physical device characteristics and the media type as well.
As I found out this due to a flash device being used with a controller that does not support JBOD.
To fix this I had to create a RAID 0 virtual disk for the SSD. If you have a Dell controller this means you have to set the mode to RAID but make sure that all your regular HDDs to be used in the disk group is set to non-raid! Once host is back online you have to go and mark the SSD drive as flash. This is the little "F" icon in the disk devices.
This environment was configured with all the necessary VSAN prerequisites for Dell in place, you can review this on the following blog post:
http://virtualrealization.blogspot.com/2016/07/vsan-and-dell-poweredge-servers.html
Steps to setup RAID-0 on SSD through lifecycle controller:
- Lifecycle Controller
- System Setup
- Advanced hardware configuration
- device settings
- Select controller (PERC)
- Physical disk management
- Select SSD
- From drop down select “convert to Raid capable”
- Go back to home screen
- Select hardware configuration
- Configuration wizard
- Select RAID configuration
- Select controller
- Select Disk to convert from HBA to RAID (if required)
- Select RAID-0
- Select Physical disks (SSD in this case)
- Select Disk attribute and name Virtual Disk.
- Finish
- Reboot
After ESXi host is online again then you have to change the Disk to flash. This is due to RAID abstracting away most of the physical device characteristics and the media type as well.
- Select ESXi host
- Manage -> Storage -> Storage adapters
- Select vmhba0 from PERC controller
- Select the SSD disk
- Click on the "F" icon above.
VSAN - Changing Dell Controller from RAID to HBA mode
So had to recently make some changes for customer to set the PERC controller to HBA (non-raid), since previously it was configured with RAID mode and all disks was in RAID 0 virtual disks. Each disk group consists of 5 disks with 1 x SSD and 4 x HDD.
I cannot overstate this but make sure you have all the firmware and drivers up to date which is provided in the HCL.
Here are some prerequisites for moving from RAID to HBA mode: I am not going to get into details for performing these tasks.
I followed these steps:
I cannot overstate this but make sure you have all the firmware and drivers up to date which is provided in the HCL.
Here are some prerequisites for moving from RAID to HBA mode: I am not going to get into details for performing these tasks.
- All virtual disks must be removed or deleted.
- Hot spare disks must be removed or re-purposed.
- All foreign configurations must be cleared or removed.
- All physical disks in a failed state, must be removed.
- Any local security key associated with SEDs must be deleted.
I followed these steps:
- Put host into maintenance mode with full data migration. Have to select full data migration since we will be deleting the disk group.
- This process can be monitored in RVC using command vsan.resync_dashboard ~cluster
- Delete the VSAN disk group on the host in maintenance.
- Use the virtual console on iDRAC and select boot next time into lifecycle controller
- Reboot the host
- From LifeCycle Controller main menu
- System Setup
- Advanced hardware configuration
- Device Settings
- Select controller card
- Select Controller management
- Scroll down and select Advanced controller management
- Set Disk Cache for Non-RAID to Disable
- Set Non RAID Disk Mode to Enabled
Subscribe to:
Posts (Atom)