Wednesday, April 13, 2016

vCenter Server 6.0U2 errors - "lost access to volume"

Recently upgraded environment from 5.5 to vCenter Server 6.0U2.
Hardware consists of Cisco UCS with B200M3 blades and XtremIO storage.

After the upgrade users complained about slow and dropped connections to their VM's.

Troubleshooting:

Installed a host with vCenter Server 6.0U1 and did not get the error message which was very strange, so what has changed between 6.0U1 and U2?
After reviewing the logs found that around every 30minute received errors "lost access to volumes".
Further troubleshooting on logs revealed that this only happens on the XtremeIO datastores.

Also following warning message within vmkernel log file on ESXi host:

WARNING: NMP: nmp_PathDetermineFailure:2872: Cmd (0x85) PDL error (0x5/0x25/0x0) - path vmhba4:C0:T0:L10 device naa.514f0c514ba0000e - triggering path evaluation


Found the following KB from EMC and VMware which relates to this issue:

https://support.emc.com/kb/467750  (need login to view)

vSphere 6 added new VMCP feature with clear distinction between PDL and APD SCSI sense codes.
Good KB from VMware:

This issue relates to the XtremIO firmware (< 4.0.1) that provides a response (illegal request) to the vSphere 6.0 host SMART data request which triggers path evaluation for PDL condition.


Fix:

Upgrade XtremIO firmware to 4.0.1 and above. Latest recommended.
This issue could also affect other storage arrays so please make sure to check with VMware on this and keep the VMware KB as a live bookmark.

At end of day make sure to check the VMware compatibility guide.