Category Archives: Server Virtualization

Vendor Minimum Requirements

Recently I was pulled into a slow performance issue with a VM. After looking at the standard metrics of value and seeing that none of them were really bad, I started asking some more questions.  We contacted the vendor and they asked a bunch of questions back.  Of course we responded being very transparent and the vendor came back and said your performance problem is due to the fact that your CPUs aren’t 3.0Ghz or better.  Now since we use AMD, this doesn’t surprise me though none of our testing matched up to the claims the vendor made around 30% increase by using this new functionality.  Our numbers say zero change in performance.  So I pushed back and started reading on the requirements and this is what they list.

Vendor Minimum Requirements

3.0Ghz Minimum

 

This is great to say and is a wonderful cop out by the support team.

“Oh wait, you don’t have a 3.0Ghz box”
“Ok, what model?”
“Doesn’t matter, must be 3.0Ghz or faster”
“Alright, so will my 3.0Ghz Pentium 4 that’s 5+ years old be fast enough?”

Try again vendor.   Be accurate and precise with your data and information at time of document creation with the expectation to deliver the right information to deliver a functional, quality product.

Limits have their limits

I’ve been chewing on this post by Duncan at Yellow Bricks for the past month and a half.  It covers some complicated issues that one has to deal with in a enterprise size environment with many assumptions on what gets you into this mess in the first place.   The best thing to do is downscale and upscale as needed based on good performance monitoring and bottleneck research.  Thankfully I’ve managed to make good relationships with most teams where I work that this has become the standard operating procedure though sometimes we just can’t.   At the end of the day the issue boils down to the simple goal:

“As the VMware environment administrator, how can I make better use of what I have available to me?”

For my environment I run into a variety of political reasons going from..

  • “I am going to need that extra 2 CPUs someday in the future so I can’t give them up now.”
  • “The vendor docs say I really do need 8 CPUs and 128G of RAM for my 3 users even though 126G is unused.”
  • “Someone on your team said I really do need that 8G of RAM so I won’t give it up”
  • “Oh come on.. what’s another 2G of RAM”
  • “I gave up my budget for a physical to do this as a virtual even though I’m still spending less in the grand scheme.  Gimme more resources.”

to the begging

  • “Pleaseeee.  I think it’ll help my issues.  It might even make me look better to my co-workers.”

I have two distinct use cases that really showcase that this kind of capability can be a hard item to use.

Case #1:  The poorly written VBscript

Back in the early Windows 3.1 days when VB was a novel concept, some developers made this ground breaking app that would pull data from a remote system, massage the data a bit and put it into a centralized Btrieve database.   Well this script that they wrote goes to sleep for a minute after the remote system’s queue it checks is empty.  This script sleep function checks the clock to see if a minute has passed.  It constantly checks the clock which consumes 100% of the CPU all the time.   This wasn’t much of an issue when each one of these systems was on its own old PC system.  We virtualized them since 16 XP workstations in the datacenter is a management headache.   Now that’s 16 high power, multiple generation newer cores being used 100% all day long for no good reason.

We, VMware Admins, have discovered that on the old PCs these systems would easily take 5-10 mins to work through their work queues.   On the newest hardware we have with these as VMs, it takes under 15 seconds to do the same work.   So for 60 seconds it is doing nothing except checking the hardware clock.

Solution #1:  CPU limits good

We implemented a CPU limiting resource pool for these VBscript VMs.   They are still running mega fast in comparison to where they were a year ago.   Now they are using no more than 8 cores worth at any given time.  A big improvement until the app developers decide if they are going to replace all that code with sleep 60 or recode the entire app.

Case #2:  vCenter SQL Server Memory Limits

Due to a feature in vCenter 4.0U1 and ESX 3.5 Hosts, when I increased the RAM on my vCenter dedicated SQL Server from 4G to 8G, a Memory limit was set of 4G.   When I would go onto the SQL instance, SQL Server.exe would only be using about 3600 Megs yet all 8G was consumed/used.   This screamed to me an issue with the OS instance.   After close to 10 days of head beating and not understanding why my brand new vCenter 4.0U1 system was running so poorly, a co-worker with a fresh set of eyes noticed this setting on the SQL Server instance.

Solution #2:  Memory limits bad

This is obvious.  We disabled the limit and the SQL Server performance went through the roof instantly. We simply couldn’t tell easily that the driver was using 4G of RAM as it wasn’t a process.  Nobody noticed the ballooning happening.

At the end of the day there’s pros and cons to having this level of capabilities.  This is why I like ESX and the general approach of VMware.   Give you everything we can in terms of options, configurations and rope to hang yourself and two of your friends.   We will attempt to automate this and hide this as much as we can.   The Vendor will never know all the situations we, people in the field, are going to run into so let’s give us all the options they can.  Use that rope with caution.

http://www.amazon.com/gp/feature.html/ref=amb_link_86250151_1?ie=UTF8&docId=1000453281&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=right-1&pf_rd_r=1RRCWNZDTV8MFM1WDEGE&pf_rd_t=101&pf_rd_p=503481191&pf_rd_i=163856011

Business Objects is Virtualization/MultiCore Stupid

Recently I have been involved in discussions internally on what it will take to get Business Objects onto a Virtual Machine.   The main talk has been around potentially removing another equivalent product and moving entirely over to Business Objects.   Then we got pricing for Business Objects.  

The standard piece of hardware today is pretty hefty even a small 1/2U system.   They come with multiple cores.   You have to do a special order to get anything less than a dual/quad core today.   An enterprise doesn’t order single sockets either.   Kinda silly to save $500 when you can have 2x the power and be able to reuse this system in the future for other purposes.  

They price and only price by physical cores in a system and on all systems their software could potentially run on.  

Business Objects is blowing a potential sale since today we only need something like 6-8 cores worth of power today and making these systems into VMs is ideal.   It isn’t like Enterprises are out to “screw” vendors.   Yes we all want a deal though Enterprises just want to pay for what they use.   If they would just license use of ~8 CPUs (virtual or physical or core) and let us make these VMs they win.  

Even for us to make these physical is a joke.    We have to disable cores and sockets to make us legal.  

So.. BO is blowing it.   They need to grow up and stop making Mainframe’s look cheap with their licensing policies.

VMworld 2009 – Keynote Thoughts

Ok.. Its proven.   No-one in the industry can keep up with Scott with blogging.   You want a live blog go here.  

The Cloud.  Business Complexity.   Give businesses Flexibility.  

The great question is how do you do this?   Much of the point is “Simplify so things can happen since it is so complex today.”   People understand the current environments where application stacks exist on physical hardware instances and it works.   Complex, though it works.  

The Cloud that VMware is proposing is a layer between this hard tie of Physical Hardware to the Application Stack.   Much of this is not anything new to folks.   It is a straight up thought of API development.   Take something that lot of different systems/interfaces do and abstract it into an object with well defined interfaces.  

By adding in this additional layer IT and business can simplify those interactions.  This is an amazing and great piece of simplification that is being offered.   Not something to make light of if one has worked in a seriously large IT shop.   VMware is the only company I’ve seen to date that has taken a serious analysis of these interfaces between the different technologies and domains involved in supplying services from IT to the business side.   This covers all the various pieces of the VMware structure like vSecurity, vStorage, vNetwork, vCompute, vAvailability, vScalability and the other vStuff.  

The individualized components that now have well defined (or at least a first run at it) of these interfaces.  So the complexity can still exist inside of the box that is defined though others can deal with that component in a nice clean way.  

Then too you can start producing some nice flexibility to do auto provisioning and self management.   Since you have this API you can control anything that talks to those interaction points.   This is why VMware is on the cutting edge and Microsoft & Citrix are still behind.

Virtual Iron dead in the water?

Is this true?  If so what’s Oracle’s game plan? They just buy Virtual Iron for a virtualization management. Appear to have a clue since they are buying Sun with a huge virtualization skill set and product line. Then this info comes along?

http://www.theregister.co.uk/2009/06/19/oracle_kills_virtual_iron/

In a letter to Virtual Iron’s sales partners, Oracle says it “will suspend development of existing Virtual Iron products and will suspend delivery of orders to new customers.” And in a second letter to a partner speaking with The Reg, the company says it will not allow partners to sell new licenses to anyone – including existing customers – after the end of this month.

So is Oracle’s plan just to cannibalize Sun & Virtual Iron’s code and forsake all the customers they bought?

Makes me pretty sad since I’m a big Sun fan with OpenSolaris and I have the fear that they will do the same thing there.

Cluster size of 8 is the only size to do

For a large organization that has more than 8 VMware Hosts (nodes) one should only make 8 node clusters for the time being according to the vSphere 4 Config Max doc.

http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf

If you look on page 7 it says “Configurations exceeding 40 VMs per host are limited to cluster size no greater than 8 nodes“.  

What that means is to be supported you have the following chart:

8 Nodes  x 100 VMs / Node  == 800 VMs per cluster max

9 Nodes x 40 VMs / Node     == 360 VMs per cluster max

20 Nodes x 40 VMs/ Node    == 800 VMs per cluster max

32 Nodes x 40 VMs / Node   == 1,280 VMs per clustre max

Now I might not be too smart here some days though I think the math is pretty obvious.  Either I do 4 clusters of 8 nodes to give me 3,200 VMs on 32 boxes or I stick them all into one nice big cluster and get 1,280 VMs out of those 32 boxes.   Easy math to me.  

The question I have out of this is, why do you document it this way?  That sure is confusing and I could have easily missed that.

LoadTesting takes a turn with Server Virtualization

Just recently I had an interesting conversation with one of my clients.  In this internal consult with her, she comes to me and says of our VMware Solutions that there is a

“potential issue with 4 vCPUs and scheduling”

I looked at her and was a bit confused.   I told her

“No.. there isn’t a scheduling issue for 4 vCPUs in our environment.”

The response was

“Well in our last consult effort, the solution was changing the system down to 2 vCPUs instead of using 4 vCPUs.”

Ah ha.   I understand now.  The light bulb suddenly came on.

One of the big benefits in Server Virtualization is the idea that we can make some rather “drastic” (by legacy mentality) changes to a Machine/OS instance that you couldn’t or didn’t want to do in a physical system.

  1. Ask yourself, When was the last time I went and pulled out Processors from a physical system in load testing?   I’ll bet you a beer at VMworld that you haven’t done that in years if ever.
  2. Now when’s the last time you changed a Virtual Machine from a 4 vCPU to a 2 vCPU system to try to improve performance of a given OS instance.   I’ll bet a beer again that you have done this in the last 60 days.

It is so easy for us to make those changes we don’t think of the appearance of how it looks to our end clients.   As in my recent discussion, they assumed that this was due to VMware ESX not being able to schedule multiple CPUs well.  This may have been due to poor wording on my part or a leap of logic by my internal clients.   In reality it is more likely, after I’ve gone back and started digging into some of the metrics collected at that time, that the application itself doesn’t work as efficiently with 4 vCPUs as it does with 2 vCPUs.

Also it is very easy and a lazy / simple approach for us as IT folks to just assume that since I’ve got two sockets quad core in a physical that the load testing that we’ve done is going to be the fastest application performance we can get.   The reality of the situation is that unless we do some tests covering changing the hardware around we really have no idea that if I got a single core, single socket my application may run (WAY) faster.   We just don’t know.

If your serious about IT and doing things Right and Correct for your company and yourself, you have to be honest with what your looking at (metrics and performance) and what it really means to yourself (flexibility) and to your clients (appearance).   I know I’ve had a fantastic ephinany and that will help me be a better IT person now.

Cisco joins the Server Market

Today’s big news:

Cisco has come out with a single stop full solution rack for CPU, Disk & Network in one using all the best of virtualization technology of Storage, Server & Networking.   Tight VMware integration, all Cisco hardware & lots of virtualization technology at 10G.

Over the past several years I’ve kinda figured that Cisco has lost their way.  Floundering a bit.  Now I know someone at Cisco has a brain.   VMware has blazed the path in recent history into the Enterprise Datacenter.  The gotcha  has been trying to integrate with networking and storage.   This is in theory resolved by this solution of a fully integrated delivered product line.

So how does HP & IBM & Dell correspond with this?

Cisco’s now a full service stop

Cisco’s throwing their gloves into the ring for server hardware. Now they can offer Server Hardware, Datacenter Experts (strong in Server Virtualization with their investments into VMware) and Networking Hardware.  All they need now is to offer some Storage Virtualization Appliance on their Server Hardware and they can start to offer the whole datacenter in a shipping container.

Next Up:  Commodity Datacenters.   Everything the Cloud wishes it could be.