Upgrading to ESX 4 Classic sets NIC speed to static setting

Recently I found that my environment of ESX 4 Classic has a lot of static NIC speed settings. This is a rather interesting discovery as I’ve been a strong proponent of using Auto Negotiation on all the NICs as that will help one find flaky or poorly connected cables along with a variety of other issues. If you have it force set to a speed, you won’t get any notification that a given connection is working poorly until someone digs in pretty deeply.

Ethermind goes into very well with some deep research into WHY you should always use AutoNegotiation at Gig or faster speeds.

Nic Configured Speeds

As such found that there are many 100meg static settings. This has a tendancy to cause problems with network performance when the other side thinks it is auto-negotiating.

To fix this if you are properly redundant is a safe way to walk through your environment updating and giving each NIC a chance to negotiate before doing the next on each host.

1
2
connect-viserver -server vCenter -cred (Get-Credential)
get-vmhostnetworkadapter -physical | where {$_.Name -eq "vmnic0"} | set-vmhostnetworkadapter -Autonegotiate

Update 30 July 2010 @ 11:38am: Just found out that I’m not the first to blog post about this. Here’s an earlier post on the subject.

Upgrading Firmware on ESXi

A bane of any system administrator’s existence is the constant stream of firmware updates to fix various bugs and issues that occur. One of these HBA Firmware updates is a fairly common issue where a new LUN or Target is not being discovered without a reboot. With Emulex there is a set of tools called the HBAanywhere that can be installed onto ESX Classic. Then you can perform an Emulex HBA Firmware upgrade without having to reboot the ESX Host.

Example script:

1
2
3
4
cd /usr/sbin/hbanywhere
./hbacmd listhbas
./hbacmd download 00:00:01:02:03:04:05 zd282a4.all
./hbacmd hbaattributes 00:00:01:02:03:04:05

With this you have just updated the firmware to 2.82a4 on an LPe11000-M4 card.

With ESXi it isn’t that easy since hbacmd isn’t available. The solution I came up with is to create a bootable WinPE CD with the Emulex Tools and firmware available on it. Then all you have to do is boot off this CD and you will be able to update your 10G CNA or your Emulex LPe12000 HBAs.

This script expects you to have the Windows AIK installed locally. Just update this script to point at the appropriate locations of the listed files.

CustomEmulexWinPE.cmd – Run this to create a bootable ISO that will install the Emulex WinPE utilities & drivers and then attempt to upgrade the 10G CNA & LPe12000 cards in the system.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
call "C:\Program Files\Windows AIK\Tools\PETools\pesetenv.cmd"
 
call copype x86 EmulexWinPE
 
imagex /mountrw Winpe.wim 1 mount
 
mkdir mount\Emulex
xcopy "setupElxAll-x86.exe" mount\Emulex\
xcopy /s "Firmware" mount\Emulex\
del mount\Windows\System32\startnet.cmd
xcopy "startnet.cmd" mount\Windows\System32\startnet.cmd
 
peimg /prep mount\Windows
imagex /unmount mount /commit
copy winpe.wim ISO\sources\boot.wim
 
oscdimg -n -betfsboot.com ISO EmulexWinPE.iso

startnet.cmd – This runs on boot and will attempt to update the 10G CNA & the Emulex LPe12000 cards in the system.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
wpeinit
 
\Emulex\setupElxAll-x86.exe /q
cd "\Program Files\Emulex\Util\elxApp"
 
winlpcfg download a=lpe12000-m8 i=\emulex\lpe12000\ud111a5.all
winlpcfg download a=lpe12000-m8 i=\emulex\lpe12000\ub202a2.prg
 
winlpcfg download n=1 i=\emulex\oce10102\s1462001.ufi
winlpcfg download n=2 i=\emulex\oce10102\s1462001.ufi
winlpcfg download n=3 i=\emulex\oce10102\s1462001.ufi
winlpcfg download n=4 i=\emulex\oce10102\s1462001.ufi
 
winlpcfg listhba

Once this script is done running you’ll have a EmulexWinPE.iso that you can mount and boot off of. It will automatically run and upgrade the firmware of the Emulex devices.

This basic setup should allow you to do anything scripting wise you need to do in a Windows environment to update the hardware configuration or run various diagnostics tests outside of ESXi.

Download Locations:
setupElxAll-x86.exe
Windows AIK
Emulex Firmware – Download as needed and update the location you copy it to your WinPE ISO file in the Startnet.cmd and in the CustomEmulexWinPE.cmd.

VMworld Public Voting

In an attempt to get better sessions picked out at VMworld, the part of the selection process is now done via public voting.

http://vmworld.com/community/conferences/2010/cfpvote/

This has been an interesting process watching Twitter and a variety of blogs discuss various sessions and which ones are good and which ones folks want to go see. As one would expect the immediate result is that there is a lot of sales pitching going on.

“Vote for my session”. “Vote for me”. “My company’s session needs some love to get to VMworld”.

Overall this has been fairly harmless in my mind. The immediate gut feel is the sessions are going to be popularity contests. Those folks that are the most popular are going to get sessions. That must be bad. That can’t be right.. good sessions aren’t only folks that are popular. True to an extent. Once one starts thinking about popularity and why individuals are popular, my reasoning says this is actually a good thing. The question to be asked is “Why are those folks popular?” Any of a multitude of reasons means that session is likely going to be a good one.

  • The individual is a good speaker and is a lot of fun
  • The individual has good things to say
  • The individual knows many people

At the end of the day isn’t the goal to have a good session whatever the reason this presenter is popular?

Using PowerCLI in an enterprise environment

Most enterprises take security very seriously.   As such it is extremely common to have a webproxy setup by default in your environment.   PowerCLI/Powershell are configured by default to Use System Proxy.   This means that when you attempt to use some of the cmdlets they will end up being attempted to get routed through the proxy.

Two cmdlets that do not work properly when this is happening are Install-VMHostPatch & Copy-DatastoreItem. A good indication of the proxy being involved is exceptions that have items like:
Proxy Authentication Required

To work around this wrap the code in a Set-PowerCLIConfiguration proxy avoidance.

1
2
3
4
5
6
$CurrentProxyCfg = Get-PowerCLIConfiguration
Set-PowerCLIConfiguration -ProxyPolicy NoProxy
...
Install-VMHostPatch [...]
...
Set-PowerCLIConfiguration -ProxyPolicy $CurrentProxyCfg.ProxyPolicy

Thanks to @cshanklin & @lucd for their assistance in aiming me down the right path.

PCoIP painting issues

Finally got PCoIP with View 4.0.1 up and running.   All excited and thrilled to compare it to RDP.  It was looking good until I fired up IE 8 and went to a couple websites.  Some had issues.. Some didn’t.

IE8 based Painting Issues

IE8 based Painting Issues

I then launched vSphere client only to be unable to see any of the objects in the left hand window in the client.

vSphere Painting Issues

vSphere Painting Issues

This VM is running on ESX 3.5U4 on VM Hardware 4.   The quick fix is to upgrade to VM Hardware 7 which entails all the updates to vSphere 4 & VM tools updates.

The other fix is due to the following two bugs in Hardware version 4.

  1. Completely uninstall the View Agent
  2. Reboot
  3. Reinstall the View Agent (Make sure that the Video Driver version is [...].0032)

If this doesn’t do it then you are probably having an issue with VRAM.   The fix is to adjust the pool inside of View for this machine and set the resolution and # of monitors so they come out to a number divisible by 64.  (Kudos to my Support Wizard for finding this one.)

The magic formula is

((#of monitors * Width of Resolution) * (# of monitors * Height of Resolution) * 4 )/1024 == Multiple of 64

Keep in mind if you have less monitors than you set the pool to, PCoIP handles this gracefully and it doesn’t cause issues.

Scale Up or Scale Out™

Duncan over at Yellow-Bricks.com brings up the great discussion once again.   Every time  a brand new piece of hardware comes out with more RAM possible or better, faster CPUs I have the “Scale Up or Scale Out™” Discussion with many people.  I have this discussion every 9-12 months on average.  We end up covering all sorts of criteria on what to compare and what is acceptable and what is not.

Our conversation usually goes something like this:

The hot new badness just came out and we need to order more hardware.

Awesome.   So how much does this puppy have in it?  RAM?  CPUs?   Slots for HBAs & NICs?

Did you know the new motherboard comes with 4 NICs now so our standard config can go from 4U to 2U and gobs of RAM with 6 core CPUs now.

Awesome!   *pause*  You know with that much RAM I can put 100 Win7 VDI systems on there.   Umm.. What about when it goes down?

Oh.. Hrm.   That wouldn’t be so good.   ….

That being said we generally end up breaking it down to a couple of factors.

  1. What is the current capacity configuration we run with today?
  2. What is our current pain points in CPU, Memory, Network or Storage?
  3. Is there any new architecture changes coming that will impact this design?   Is there a new switch fabric that needs to be plugged into?   Is there changes to storage that need to be addressed?
  4. How much does this new hardware configuration cost?
  5. How will this change affect DRS’s Chaos Theory?  The more hosts, the more DRS can do for you due to Chaos theory.
  6. What is our Risk level for number of eggs in a single basket?

The point is most corporation’s environments aren’t starting from scratch.  In my case we have a known configuration today to use as a baseline and adjust the environment and design every hardware order to make it better.

In our most recent order we had this discussion all over again.  This time we had some architectural changes needed to prevent some false positive HA events from happening in a 2 time a year strange events.   So we are going to a 3 switch connectivity solution to enable network beacon for NIC teamed connections.  We started with the following information:

  • Baseline:  HP DL585 G5, 4 sockets w/ quad cores, 128G of RAM, 3 Dual 1G NICs, 2 Emulex LPe11000 HBAs
  • Cluster: 10 Host Clusters with ~30 per Host in Servers and ~65 Workstations per Host in View
  • Pain Points:  CPU starvation, Licensing Issues with 10 Host sized clusters
  • Risk Level:  Politically we are getting pretty touchy about more than 30 Servers going down in a single blow even if HA works on bringing them up in under 15 mins automatically.

We compared 3 different models of newer, faster, badder and more wicked hardware from HP since the DL585 G5s are not really on the manufacturing line anymore.   So we looked at the BL495cG6, DL585 G6, DL385 G6 and DL580 G6.

DL585 G6:

  • Pros
    • Proven and comfortable AMD based stable platform with a good price/performance cost.
    • Gain more CPU resources with the additional 2 cores per socket.  6 core systems.
    • Can build 5 Host clusters to address licensing issues.   Issues with HA support for the density involved.
  • Cons
    • Same Risk Level as before.

    Push

    • Same architectural solution today with maybe another NIC card to enable the NIC Beaconing

DL580 G5:

  • Pros
    • Fastest individual cores out there.   Lots of good press about the Intel.
    • Should get better CPU resources with higher performing CPUs.
    • Can build 5 Host clusters to address licensing issues.   Issues with HA support for the density involved.
  • Cons
    • Significant premium in cost for speed.   See easily a 25% premium for a 10% faster performance.
    • Same Risk Level as before.
  • Push
    • Same architectural solution today with maybe another NIC card to enable the NIC Beaconing.

DL385 G6:

  • Pros
    • Lowers the risk level without lowering performance
    • Best price/performance cost for 6 core systems
    • Has enough slots to move to the newer network layout to enable NIC Beaconing
    • Gain more CPU resources with the additional 2 cores per socket.  6 core systems.
    • Put 64G of RAM into them and build 5 host clusters for licensing problematic applications.
  • Cons
    • More physical hosts to deal with (cabling, power, rack space, cooling, management)

BL495c G6:

  • Pros
    • Blades reduce the amount of cabling
    • Gain more CPU resources with the additional 2 cores per socket.  6 core systems
  • Cons
    • Firmware Management is an issue
    • Increases our Risk Level with more eggs in the same basket unless we get multiple chassis to spread the blades across
    • New solution from the ground up running ESX on blades
    • Not ready to support Flex10 and because of this we have limited NIC capabilities to fit our requirements

We decided to go with the DL385 G6s based on these criteria.  We will dedicate a specific 5 Host cluster for problem children applications with licensing issues.   The RAM size of the hosts will limit the number of VMs we can end up putting in a cluster which addresses the Risk Level of number of VMs per Host.  We are still way ahead of the game using VMware so having to have a couple more physicals for all these improvements is not an issue.

In your company or solution something else may be more appropriate.  The key in an ongoing improvement mentality is have things you can measure and then criteria on what to change along with why.   There is no one size fits all answer which is why VMware works so well for so many different folks.   We don’t have to change how we do things to gain a lot of flexibility in the Datacenter while not changing how we ultimately end up managing these systems.

Cool Apps to play with by VMware Engineers

Came across VMware Labs website today.   A nice website for VMware to show case the quality work that its employees are developing to improve the general vSphere environment.  I have used or looked at 1/2 of these and was pleasantly surprised to discover some new tools.

Per the site manifest:

This is our place to share cool tools created by VMware engineers.  There is a wide range of tools here for you, including one for automating tasks, getting ESX performance graphs, a rich Internet application framework and much more. These tools are offered under Technical Preview or relevant Open Source License.

They are calling each app/tool/API a fling.   This is a pretty smart naming.   I have seen several start up companies employees start making some cool code that never gets to see the light of day.   They are just flings of interest to help a specific problem.   They don’t always become full fledged products.

Here’s the current list:

Apache Pivot

Like most modern development platforms, Pivot provides a comprehensive set of foundation classes that together comprise a “framework”. These classes form the building blocks upon which more complex and sophisticated applications can be built.

Dynamo RIO

DynamoRIO exports an interface for building dynamic tools for a wide variety of uses: program analysis and understanding, profiling, instrumentation, optimization, translation, etc. Unlike many dynamic tool systems, DynamoRIO is not limited to insertion of callouts/trampolines and allows arbitrary modifications to application instructions via a powerful IA-32/AMD64 instruction manipulation library. DynamoRIO provides efficient, transparent, and comprehensive manipulation of unmoOndified applications running on stock operating systems (Windows or Linux) and commodity IA-32 and AMD64.

esxplot

Esxplot is a GUI based tool that lets you explore the data collected by esxtop in batch mode. The program loads files of this data and presents it as a hierarchical tree where the values are selectable in the left panel of the tool, graphs of the selected metrics are plotted in the right panel.

Onyx

Onyx is a standalone application that serves as a proxy between the vSphere Client and the vCenter Server. It monitors the network communication between them and translates it into an executable PowerShell code. Later this code could be modified and saved into a reusable function or script.

SVGA Sonar

VGA Sonar is a demo application for SVGADevTap. SVGADevTap is a user-level library that communicates with the VMware SVGA guest driver to provide low-latency notifications of changes to the screen.

vApprun

The vApprun tool implements the same vApp/OVF feature set as the vSphere 4 release. Thus, Workstation/Fusion can be used as a development environment for advanced OVF packages, and it can be used to evaluate and test OVF packages on your desktops and laptops.

vCMA

VMware vCenter Mobile Access (vCMA) – vCMA allows you to monitor and manage VMware Infrastructure from your mobile phone with an interface that is optimized for such devices.

VGC

VMware Guest Console allows you to manage the Guest OSes from the VMware layer.

VI Java

vSphere Java API is a set of Java libraries that sits on top of existing vSphere SDK Web Services interfaces. It provides full managed object model and run-time type checking, resulting dramatic productivity boost. With the new Web Services engine in 2.0, it also performs much faster than engines like Apache AXIS up to 15 times.

Virtual USB Analyzer

The Virtual USB Analyzer is a free and open source tool for visualizing logs of USB packets, from hardware or software USB sniffer tools. As far as we know, it’s the world’s first tool to provide a graphical visualization along with raw hex dumps and high-level protocol analysis.

If you want to see what is possible with a companies products, these are some of the tools to go look at.  http://labs.vmware.com

VMware Support offline for most of day due to power outage

On the 17th of Feb some form of Power Failure occurred at one of VMware’s Palo Alto locations.   From what I understand this primarily affected the support systems and as such the phones were down for most of the day.

Update 18 Feb 10 @ 8:28pm:

@vmwarecares Network outage here caused by small plane crash in Palo Alto

http://www.cnn.com/2010/US/02/18/texas.plane.crash/index.html?hpt=C1

VMware Support back and up and running by 9pm CST.

Cloud Computing Solution Provider – VMware

The recent Zimbra acquisition by VMware threw me a bit for a loop initially.  Then I started chewing on it and read the good post by Rodney Haywood.   Very shortly afterwords I had a classic Homer Duh moment.

VMware aims to build from the ground up the best cloud computing solution for sale as possible.   That is taking into account that cloud computing definition today is as about as vague as a real cloud in the sky.  Today that cloud is fluffy and in 5 mins that cloud is shaped like a rabbit.   As such they have built a pretty strong infrastructure level for customers with vSphere, vCenter and various add-on tools.   They have picked up SpringSource to offer ultimately a platform for services and understanding of how the JVM interacts more closely with the hypervisor.   Now they are getting into the services space with Email/Calendaring.

  • IAAS -> vSphere/vCenter
  • PAAS -> SpringSource
  • SAAS -> Zimbra

Each of these areas is really focused on a different customer base at the end of the day.  Sure you can say IT and that’s like saying your customer base for is for the TV viewing audience.    It is too vague and there is better & a more definable end customer grouping.

  • IAAS -> Server/Storage/PC/Hardware Teams – Ground Level System Admins
  • PAAS -> Development Teams making solutions up – Architects/Developers
  • SAAS -> Back Office management/utilities – Often more visible by the CxOs.

So where are they going next and what areas are missing for the full suite for all the different customer bases they are aiming for?