Tuesday, December 31, 2013

Error: Virtual disk 'Hard disk x' is a mapped direct-access LUN that is not accessible

Our exchange server 2010 VM has RDMs for each database. Overnight the volumes filled up after commvault snapshots of database volumes which took the volume offline and stopped the VM.
Reading online it seems this problemalso occur when users try to vMotion a VM but that has not happened in my case.

The VM was rebooted by the Exchange admin but during start up the VM gave the following error message:

Virtual disk 'Hard disk x' is a mapped direct-access LUN that is not accessible

Troubleshooting:

On vsphere client when selecting the Hard disk and selecting manage paths you get errors "there is no multipath configuration for this LUN"


Identify if the Raw Data Mapping (RDM) LUN signature vml.############ assigned to the VM no longer matches the physical LUN assigned to the ESXi Host.
  • On VM record the physical LUN mapping vml.########
  • On ESXi host where VM resides run command to find the device name entry naa.############ for the vml identifier of VM.  The "Other UIDs:" displays the vml identifier.
  • On each ESXi host within the cluster verify that the device name has the same vml entry as what is associate to virtual machine hard disk.
    • ls -alh /vmfs/devices/disks

Solution 1:

This was my first try without consulting online and did not want to remove the Mapped Raw LUN from the VM so to fix this I did the following:

  1. I disabled the problem LUN on the back-end storage (our case Netapp) and then on each host i re-scanned the storage so that the devices was not showing anymore.
  2. I enabled the problem LUN again and rescanned the hosts.
  3. Make sure the device is showing again.
  4. Verify that the datastore mapping file (vml.######) was identical on VM to that on each ESXi host in the cluster for the particular problem hard drive.
  5. Started up the VM and problem was resolved.
If this solution does not work for you, then i would recommend using solution 2 which is pretty much straight from the VMware KB 1016210

Solution 2:

Identify the problem hard drive and record the following settings on virtual machine:
  • Mapped Raw LUN's - physical LUN and datastore mapping file location. (vml.############)
  • SCSI ID
  • Associated LUN ID
    • to get this information you need to run a few commands on ESXi host where VM currently resides
      • ls -alh /vmfs/devices/disks (this will give device indentifier from RDM vml address.
      • esxcli storage core path list (shows device identifier with LUN information) 
  • Remove the RDM LUN
  • Map the RDM LUN again within the VM.  This will be done by snapdrive agent on server.
  • Power on the VM.


Links:


Sunday, December 29, 2013

vCloud Director upgrade from 5.1 to 5.5

Current Version:

5.1.2 Build 1068441 vmware-vcloud-director-5.1.2-1968441.bin

New Version:

5.5.0 Build 1323688  vmware-vcloud-director-5.5.0-1323688.bin

I run a RHEL virtual machine with vcloud director installed, so before starting the upgrade I fully patch the RHEL environment.

PRE-CHECKLIST:

VUM errror during remediation of ESXi 5.1 to 5.5 host upgade: "Cannot run upgrade script on host" - resolved


VUM error during remediation of ESXi host upgrade from 5.1 to 5.5:
"vmware update manager 5.5 Cannot run upgrade script on host"

Debugging the problem:

Troubleshooting this problem led to a few discoveries online of users experiencing the same error message but different log entries regarding problem. This can be read in kb articles below:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2007163
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2014084

vCenter Server upgrade from 5.1.1 to 5.5.0

Current version:

vCenter server:       5.1.0 Build 1063329
ESXi hosts:           5.1.0 Build 1117900


New version:

5.5.0b

ESXi software file:  VMware-VMvisor-Installer-5.5.0-1331820.x86_64-Dell_Customized_A01   (we have Dell servers so using the latest Dell provided 5.5 installer)
VCenter server:      VMware-VIMSetup-all-5.5.0-1476387-20131201 (5.5.0b)

Friday, December 6, 2013

Change VM disk from thick to thin with standalone ESX host and no vcenter server connection

Just got an interesting request from user regarding a datastore that ran out of space on a standalone ESXi host.  This is a Dell R720 server with local attached storage and all drives filled.

Debugging the problem:

Upon further investigate i found they create a disk on VM with thick provisioning when they not even using half of the disk space within the vmdk.

However since ESX host is not attached to VCenter server we cannot perform a clone or migration from the console, what to do?

Resolution:

Using The Command Line (SSH)
If you don't have any space available on current datastore like in my situation then you would have to add a  temporary datastore. NFS easiest to quickly configure.

SSH to host and run the following command:

vmkfstools –i “<ThickDiskName.vmdk” -d thin “NewDiskName.vmdk”

After this is completed you have to remove the thick disk and add the newly create thin disk to the VM.
For removal options i would recommend selecting "remove from virtual machine" only and NOT yet permanently delete the disk just yet.  Wait until you have new disk configured within the VM operating system and verify it works as expected.

Thursday, November 21, 2013

vmxnet3 and Windows Operating System and network problems?

So lately we have had some customers complaining about network related problems on virtual machines running windows operating system with strange behavior like the following:
- Web page rendering issues for website on IIS
- Web page buttons that load with prompt to open a file.
- SQL errors from the web applications related to network.
- Page loads successfully on one browser, but not from a different browser on the same computer.


Debugging the problem:

Ran wireshark with mixed results.
On further investigation we were able to replicate the problem with random results through an external network on Org VDC's Edge gateway.
My firsts questions was to look at the vAPP network to see if users can replicate this locally within the vAPP with own network and access the webpage through localhost, which was eventually reproduced.

Setup ORG VDC within vCloud Director


I have put some bullet points together for myself to setup a new Organization with Org VDC.  Might be useful to someone else. Nothing advances and if you are a beginner i would probably view another blog or VMware documentation for assistance.

1.      Vsphere client:
a.      Setup ESXi hosts
b.      Setup new Datacenter.
c.      Setup new cluster
d.      Setup two different vDSs, one for management/vmotion and one for external & VCDNI    networks.  If setting up VXLAN, I would recommend setting up a separate vDS for this.
e.      Create port groups for external networks within 2nd vDS which was configured.
f.       IF VXLAN used, prepare the new vDS for VXLAN
                                                    i.     http://blogs.vmware.com/vcloud/2012/10/vcloud-director-5-1-vxlan-configuration.html
g.      Create storage profile.
                                                    i.     Create Storage capabilities.
                                                   ii.     Create Storage profile and assign storage capabilities to profile.
                                                  iii.     Assign Storage capabilities to datastores within created cluster
                                                  iv.     Make sure to enable the storage profile! (very often overlooked)

2.      vCloud director:
a.      Create Provider VDC
                                                    i.     Select previously created cluster
                                                   ii.     Select storage profile.
                                                  iii.     Select host to prepare.
                                                  iv.     Done
b.      Setup new Organization
                                                    i.     Simple setup wizard and very easy to follow and understand
c.      Setup network pool for virtual switch.
                                                    i.     Select network pool type ( I pick network isolation-backed for simplicity)
                                                   ii.     Specify VLAN and max VCDNI networks.
                                                  iii.     Select vDS created before.
d.      Setup external network and link to port groups created for external networks.
e.      Create ORG VDC
                                                    i.     Select organization
                                                   ii.     Select Provider VDC
                                                  iii.     Select allocation model.
                                                  iv.     Configure model. (for our QA/DEV labs I prefer pay-as-you-go)
                                                   v.     Allocate storage
                                                  vi.     Select network pool
                                                vii.     Configure Edge Gateway
                                               viii.     Select external network
                                                  ix.     Set IP addresses
                                                   x.     Sub-allocate IP Pools
                                                  xi.     Create ORG VDC network
                                                xii.     Name ORG

Tuesday, September 24, 2013

vCloud Director - Migration of storage to new storage profile



Scenario:
Migration from local direct attached storage on single ESXi host to more flexible environment with multiple ESXi host with new shared storage profile.

Some considerations:
  • The migration will create full clone VMs on new storage profile so please take the storage usage into consideration before starting the move.  Look at thin provisioning on VMs hard disk.
  • Can you afford to shut down the VMs or not for migration, this will affect your effort.
  • Just not vAPPs needs to be moved, also remember your vAPP templates and media.  I would start with the vAPPs and media first.

vCloud Director 5.1.2 – Bug - Retain IP/MAC resources does not apply when you use the “Move to” task to move to new Storage profile.

In previous blog post I mentioned the usefulness of this setting, but during our storage migration to new profile for vCloud director we ran into a bug where this is not applied.

I have opened a case with VMware and they verified this as a bug and now has a SR.  Hopefully get this fixed within the next build.

My current vcloud director version where this applies:
vCloud director 5.1.2.1068441

Debugging the problem:


When you have to move vAPPs to new storage profile the easiest way is  to shut down the vAPP and select “move to”.

However when you perform this task the vAPP will actually release the Org VDC NAT'd address for the VM.
If you have any NAT's configured on the Edge gateway, this will now be out of sync.

Workaround:

I will discuss this in next blog post.

VMware Labs Flings: Lctree - Visualization of linked clone VM trees


Flings: Lctree

I was just pointed to Flings by the VMware support team.
These apps and tools build by VMware engineers are great and already found my favorite for vCloud director.



This tool is designed for the visualization of linked clone VM trees created by VMware vCloud Director when using fast provisioning.  
I managed Lab Manager before and always found the build in context view feature useful to show the relationship and dependencies between virtual machines.
This helped me a lot in finding the information about shadow copies in our environment as well as the visualizing the chain length and make decisions on when to consolidate.


Applications are not supported so no fixes and use at own risk.

vCloud director setting – Retain IP/MAC resources


This is a great setting and very useful, which I hope all users are aware of.

Scenario:

We make use of internal vAPP network on each vAPP which is then connected to the Org DC network. This means that each VM has a NAT’d address to its owned assigned ORG VDC IP.
On the OrgVDC IP we then again use NAT’s to our external networks.  
The destination IP for these NATs in the Edge gateway is the Org VDC IP address assigned to the VM.

Monday, September 9, 2013

Host Spanning not working for VMs which runs on different hosts within a vAPP and using vCDNI network.

By default when a vAPP is created it creates a new port group within the associated vDS.
From testing and learning the hard way it seems the first uplink listed in the vDS is always assigned as the active uplink in the port group for the vAPP and load balancing set to “Route based on the originating virtual port ID”. 
This of course means you cannot setup teaming/EtherChannel on the physical uplink ports and whichever uplink is assigned needs to have the same VLAN ID as is configured for the vCDNI.


Debugging the problem:

In my situation the vCloud environment started with a single host and direct attached storage so only had a single vDS which had the port groups assigned for management, vmotion, external networks and to which vCDNI was associated too.
This caused the situation that our management uplink was always selected as active uplink for vApp port groups create, since it was the first listed uplink in vDS.  We however did not want to assign the same VLAN and have traffic flow over the physical management ports, physical separation always best in my opinion.

Solution:

Created a separate vDS on which I migrated the management and vmotion port groups (virtual adapters) too, as well as another for my external networks. This can be accomplished without downtime when you have 2 or more uplinks associate to the vmkernel
On the vDS which is associated to the vCDNI I removed all the uplinks. 
(On each of the uplinks, the associated vmnic has to be removed first before you can delete the uplinks from vDS, this is accomplished by the following:
Select the host
Select configuration tab
Select networking
Select vSphere Distributed Switch
Select Manage physical adapters.
Click remove for vmnic from the uplink name.

I setup two uplinks on the vCD associated to vCDNI and assigned the same VLAN ID on both of the uplinks physical ports.



Thursday, September 5, 2013

Migrate Management and vMotion virtual Adapters (vmk0,vmk1) to new distributed virtual switch (vDS) without downtime.

  1. In Vcenter server select networking.
  2. Create new vDS on vcloud datacenter.
  3. Set the amount of uplinks needed and name them appropriately.  In my case we have two uplinks each for vmotion and management so total of 4.  Create the same uplink names as on original vDS.
  4. Create new Management and Vmotion port groups (different names, cannot be same) and remember to set your VLAN and balancing/teaming policies, but most importantly change the active uplinks to the newly create uplinks. (The upcoming steps we will assign the physical adapter to the active uplinks.
  5. Now go the Hosts and Clusters.
  6. Select the ESXi host and select configuration -> networking.
  7. Select vSphere Distributed Switch
  8. Now you will see both the VDS’s.
  9. Updated simplified procedure below for steps 10 - 13, however original still works as well.
  10. On the original vDS select the manage physical adapters.
  11. Now remove the physical adapter from the 2nd mgmt and vmotion uplink.  Keeping the active primary uplink in place.
  12. After it is removed select the "manage physical adapters" on new vDS.
  13. Add the removed physical adapter to the new uplinks.
  14. On new vDS select manage virtual adapters.
  15. Click Add
  16. Select Migrate existing virtual adapters.
  17. Select the virtual adapters (vmk0,vmk1) from old vDS and select the new port group name from new vDS to be associated with on move.
  18. After completed.
  19. Now run steps 10 to 13 to remove the physical adapter from the original vDS uplink and add to the new vDS uplink.
  20. done

UPDATE: Actually found a shortcut for the process from step 10 - 13
  • On the new vDS select the manage physical adapters.
  • On the uplink name select “click to Add NIC”
  • Select the corresponding physical adapter on original vDC.
  • You will be prompted if you want to move the physical adapter from original to new vDC
  • Whalaa!

Wednesday, July 3, 2013

PowerCLI - Identify LUN names for each datastore

Nice quick and easy script with Powercli for identify the LUN names for each datastore.
This is useful for debugging the log files where only the LUN name is specified.

I am a big fan of Powergui where i run and create all my scripts in!
http://powergui.org/index.jspa
http://communities.vmware.com/community/vmtn/automationtools/powercli


Script:
(replace localhost with your own vcenter server)

Connect-VIServer localhost
Get-VMHost | Get-Datastore | where {$_.Type -eq "VMFS"} |
Select Name,@{N="LUN";E={(Get-ScsiLun -Datastore $_ | Select -First 1).CanonicalName}}


Disclaimer:
Please use this script at your own risk and test it out in your test lab first before using it in production.

vCloud Director 5.1 bug with vCDNI network pools

This is something i picked up during my initial configuration of vCloud director and stumped me for a many days until i was finally able to figure out what was going on.

I created various isolated-backed networks, however during the course of the configuration changes were required and I delete ALL of the network pools to start over. When I then recreated the network pools with the same VLAN.  When I deployed the vApps with local vApp network the VM's did not want to communicate with each other at all.

Debugging the problem:

I tried recreating all of the network pools again, and deployed the vApps multiple times with different configuration.  Tried to manually setup IP address and multiple network cards.  However when i change the VM network to a portgroup that was not part of vcloud director I was able to communicate with the VM which lead me to believe the issue was not with the VM itself but with the vCloud network component. Tested the vShield Manager which seems to be working correctly.
VMware support case opened:

VMware was unable to figure out the problem at the time...
vCloud director 5.1.1.868405
vShield manager 5.1.1-848085


Resolution:

Did some research to see if any other users were experiencing this problem and ran into the a community post which describe the problem to the teeth exactly like mine. Link below.

Solution we both came up with was to re-prepare the physical ESXi host within vCloud director, without needing to do anything else.
Another workaround to this was to always leave one network pool, and this problem only appears when you delete all the isolated-backed network pools.


Links:

http://communities.vmware.com/message/2138225#2138225

Tuesday, July 2, 2013

VMware VCenter Server upgrade to 5.1.1 - Do not click "Cancel.

During the upgrade of VMware VCenter server to 5.1.1, DO NOT click "Cancel" on the following prompt during the upgrade.






I was upgrading from 5.1.0 to 5.1.1 and during the upgrade i came to the following prompt and by reading the prompt i assume i needed to reboot my server right now in order to continue on which i select "Cancel".

This however caused the upgrade to fail, but not without the upgrade for SSO actually trying to upgrade my environment and corrupting the SSO database and server.  Afterwards i could not get Vcenter server to start or any authentication to work.

As with all my work I had a good snapshot and database backup taken before the upgrade.

Resolution:

After a support case with VMware and an hour later or debugging, we could not get SSO working again and we reverted back to the snapshot to start the upgrade process over.

Lesson learned:

Always take snapshot and databases backup before any upgrade or installation!
   This does not just apply for VMware but all applications.


VCenter Server upgrade from 5.1.0 to 5.1.1

Current version:

VCenter server:       5.1.0 build     880146
ESXi hosts:             5.1.0  build    799733


New version:

5.1. U1a

ESXi software file:  VMware-VMvisor-Installer-5.1.0.update01-1065491.x86_64
VCenter server:       VMware-VIMSetup-all-5.1.0-1123966


There are a lot of blogs and post and documentation on upgrading so i am not going to bore you with screenshots and detailed explanation but just give me short point summary of how my upgrade process:



  1. Create snapshot/backup of VCenter Server!
  2. Create backup of VCenter Server, SSO and Update Manager database.
  3. Copy install ISO to server and attached with virtual iso application like "Daemon tools".
    1. I do this because if server is rebooted and you have the ISO attached through Vcenter server client or web client it will loose the connection.
  4. Do not perform Simple install!
  5. Select each VMware production individually by upgrading vCenter Single Sign On first.
  6. Upgrade the new vSphere web client.
  7. Login to new web client and under SSO verify that domain is still setup correctly.
  8. Upgrade vCenter Inventory Service
  9. Upgrade vCenter Server
  10. Upgrade vSphere client if needed
  11. Upgrade vSphere Update Manager
  12. Reboot the server. 
    1. (I was on phone with VMware support due to corruption of upgrade and they said  I that reboot is not required until all applications have been installed and completed)

More on the corruption in another blog!

ESXi 5.1 - network port lost connection without notification - Dell Poweredge

I ran into a very serious bug the other day which caused all our production Virtual Servers to loose network connectivity on a specific ESXi host without any notifications or alarms from VMware VCenter server.  Actually VMware was completed unaware of this problem and just continued working as normal with all our servers offline.

Debugging the problem:

Reviewed all the physical ports and found that a particular NIC lost its VLAN's and VMware did not recognize this so all VM's was left in disconnected state.  No network errors was detected on the Cisco physical switches as well as from VMware so no fail over took place

Monday, July 1, 2013

VMware Vcloud Director - Deploying of VM within vApp renames the vmxnet3 ethernet adapter to #2 and lost connection. (FIXED)

I just ran into a very interesting problem today within vCloud Director when a vApp is deployed from catalog a VM will loose its network connection.  Even when Guest OS customization is disabled.


We have a vApp with multiple VMs, all Windows 2008 R2 SP1.
The vApp was saved into catalog with "make identical copy"
After creating the vApp from catalog template all the VM's come up correctly accept the one installed with Exchange.

Debugging the problem:

It seems the network was lost and then gets recreated but is set to DHCP so no network connections possible.

  • Tried to fix by disabling the "guest OS customization" before powering on the vApp.
  • Recreated the network card on the particular VM but with no luck.

Quick fix:

Shutdown the VM
Delete the Virtual network adapter.
Power on the VM
Use device manager to remove all network interfaces with name "VMXNET".
    ( For more information, see the Microsoft Knowledge Base articles 241257 and 2550978.)
Shutdown the VM
Edit VM properties through Vcloud director interface and under the hardware tab add a new virtual network adapter.
Power on VM with force customization.


Resolution:

The solution to our problem was to install the hotfix from Microsoft:

http://support.microsoft.com/kb/2550978

Direct download of the file is available here:

After install the hotfix and then saving the vApp to catalog we were able to deploy the vApp without any further problems.


Links:

Found the following KB article on VMware: