VMworld is run on a killer ESX cluster(s) (So how’d it go?)

From a very interesting tweet:
@MeghanAtBMC: VMWorld infrastructure is running 37,248 VM’s on 776 ESX servers w/total of 37TB of Memory 348TB Storage

SWEET!!!

The next set of questions coming from someone that manages a datacenter:

Can we get a whitepaper/post mortem report on:

  • How this datacenter ran during the show?
  • What issues did you see?
  • What worked well?
  • How many VMotions happened?
  • How many HA events occurred?
  • How many IOPS occurred?
  • How many last minute change requests for more CPU, Memory, Disk, Extra VMs come through?
  • What kind of change management was in place for this?
  • What were the support policies?
  • How many admins were involved?
  • How many hack attacks occurred?
  • What kind of security issues were seen?
  • What kind of repair worked happened?
  • What was Lab Manager used for?  
  • What was View used for?  
  • What versions of everything that were used?  
  • Did you have any FT VMs?   How did they work?

Any other questions datacenter folks would be intersted in?  I know I’m just itching to know.

3 days till VMworld 2009

Woot.  3 days until I hit San Francisco and begin the awesomeness that is VMworld.   I’ve got my schedule packed and for a first I’m getting asked to take a peek of a couple different vendor products from people internal to the company.   These folks want to get input on if these products will be useful for our different internal efforts with respect to our virtualization environments and OS environments.  (Note.. if you don’t catch the mega WOOTness here you should now.  VMware is becoming a noticeably important tool at director and above levels.)

AppSpeed update

I’ve had the AppSpeed demo setup and running in one of my clusters.   When you get the demo temporary license, its for 16 cores worth.   My smallest environment to test in is 160 cores deep.

I figured no issue, I’ll just see what I get for a couple different apps to see how it works.   That didn’t work out so well either.   Any actually interesting app is multi-tier which happens to bounce across multiple cores/hosts in the environment.

So I haven’t forgotten and I’m not ignoring the statement.   I’ve put in a request for a temporary license of 160 cores worth and I’m waiting for that to come through.   3 weeks and waiting now.

I’ve had the AppSpeed demo setup and running in one of my clusters.   When you get the demo temporary license, its for 16 cores worth.   My smallest environment to test in is 160 cores deep.

I figured no issue, I’ll just see what I get for a couple different apps to see how it works.   That didn’t work out so well either.   Any actually interesting app is multi-tier which happens to bounce across multiple cores/hosts in the environment.

So I haven’t forgotten and I’m not ignoring the statement.   I’ve put in a request for a temporary license of 160 cores worth and I’m waiting for that to come through.   3 weeks and waiting now.

Emulex Supplied Hardware & HP PSP 8.25A don’t like each other

In one of the environments I work in, we use Emulex drivers on all Windows systems.   Along with that we use HP hardware.    HP provides this nice Proliant Support Pack (PSP) that makes things really nice and bundled together.   Along with that one can configure the PSP release around some parameters such as “Don’t Install the Qlogic drivers” or “Configure these mgmt agents with this as their master management server to send traps to” and so forth.  

In the past we would configure the HP enhanced drivers “Do Not Install”.   As these drivers don’t work real well with non-HP Emulex devices.  

Today I found out that you can not configure the Emulex Drivers in the PSP 8.25A release.   If you are using the Emulex supplied HBAs and are running the HP PSP pack, don’t use 8.25A.

Restarting Mgmt Agents is dangerous

Once again I’m reminded that going in and doing /etc/init.d/mgmt-vmware restart and then /etc/init.d/vmware-vpxa restart is fairly dangerous and should only be done as a LAST resort.

A co-worker was removing old NFS mounts and replacing them with new ones.   So same named NFS mount was being used.   He did an esxcfg-nas -d and then an esxcfg-nas -a and per our instructions, restart the mgmt agents so VirtualCenter would see them properly.   In doing this with U4 plus a month of patches, VirtualCenter lost connectivity with the host agent with a vim.fault.NotAuthenticated.

vim.fault.NotAuthenitcated

The fix is to disconnect and reconnect as you can’t put these systems into Maintenance Mode since VMotion doesn’t work.    Together we figured out a nicer way to do the mount point cleanup.

  • esxcfg-nas -d <mountpoint>
  • esxcfg-nas -a <mountpoint> …
  • vimsh -n -e “internalsvc/refresh_datastores”
  • vimsh -n -e “hostsvc/datastore/refresh <mountpoint>”