Ideal Software Licensing Model – Requirements Collection

I’m looking for some feedback and thoughts from the community to help define a reasonable Licensing Model that takes Physical & Virtual into account.  From my view as a client I don’t think this is all that complicated at the end of the day.

More discussions with some vendors around licensing and I’m finding more and more that the following two axioms are defining these discussions:

  • Vendors want to get paid for their software (obviously the most they can be).   They are not stupid in most cases.
  • Clients want to pay for what they use (obviously the least they have to).   They are not stupid in most cases.

The challenges come from the fact that Vendors don’t get the following generally:

  • A VM in VMware is limited in processing to the vCPUs it has.
  • A vCPU is limited to what a given core is individually capable of.
  • More clients might be willing to use your software if I didn’t need to pay for 12 cores of power when I only need 2 today.
  • VMotion of a VM does not mean I’m suddenly gaining more cores of processing.

Clients get upset cause of the following items:

  • When a Vendor assumes I’m an idiot and can pull the wool over my eyes.   This a good relationship does not make.
  • A Vendor goes and says a Virtual does less than a physical, then charges me more if it is virtual.
  • A Vendor requires me to license this big physical box and I only want a couple cores worth or less than # of cores in physical box.
  • I want to use your software and because I’m running it as a virtual you want to charge me more.  I can’t even buy smaller physicals to use your software within my software budget (smallest thing I can buy within reason today is an 8 core system and I only need 2 cores worth).
  • A Vendor limits me to some physical box even though the OS/Software will be on a virtual machine.   (Who cares what physical box is on it as long as I pay for the CPU MHz I’m using?  Your software doesn’t.  Only your legal does.)
  • If I buy a lot of your software you can cut me deals since I’m spending a lot of money with you and then I’ll be interested in licensing models by physical cores or just a volume level discount.  I’d rather not start there if I can avoid it.
  • We’ve seen what happens to good tech when licensing models can’t take tech into account.   See the Mainframe and Computer Associates licensing stubbornness in the 80s contribute significantly to the rise of the distributed computing space.  We don’t want to deal with that migration if we can avoid it.

So there’s some of the requirements I have come up with.   What other requirements/gotchas can you think of that have got you in dealing with vendors?   Anything different when dealing with Solaris or AIX or HP/UX virtualization?

Limits have their limits

I’ve been chewing on this post by Duncan at Yellow Bricks for the past month and a half.  It covers some complicated issues that one has to deal with in a enterprise size environment with many assumptions on what gets you into this mess in the first place.   The best thing to do is downscale and upscale as needed based on good performance monitoring and bottleneck research.  Thankfully I’ve managed to make good relationships with most teams where I work that this has become the standard operating procedure though sometimes we just can’t.   At the end of the day the issue boils down to the simple goal:

“As the VMware environment administrator, how can I make better use of what I have available to me?”

For my environment I run into a variety of political reasons going from..

  • “I am going to need that extra 2 CPUs someday in the future so I can’t give them up now.”
  • “The vendor docs say I really do need 8 CPUs and 128G of RAM for my 3 users even though 126G is unused.”
  • “Someone on your team said I really do need that 8G of RAM so I won’t give it up”
  • “Oh come on.. what’s another 2G of RAM”
  • “I gave up my budget for a physical to do this as a virtual even though I’m still spending less in the grand scheme.  Gimme more resources.”

to the begging

  • “Pleaseeee.  I think it’ll help my issues.  It might even make me look better to my co-workers.”

I have two distinct use cases that really showcase that this kind of capability can be a hard item to use.

Case #1:  The poorly written VBscript

Back in the early Windows 3.1 days when VB was a novel concept, some developers made this ground breaking app that would pull data from a remote system, massage the data a bit and put it into a centralized Btrieve database.   Well this script that they wrote goes to sleep for a minute after the remote system’s queue it checks is empty.  This script sleep function checks the clock to see if a minute has passed.  It constantly checks the clock which consumes 100% of the CPU all the time.   This wasn’t much of an issue when each one of these systems was on its own old PC system.  We virtualized them since 16 XP workstations in the datacenter is a management headache.   Now that’s 16 high power, multiple generation newer cores being used 100% all day long for no good reason.

We, VMware Admins, have discovered that on the old PCs these systems would easily take 5-10 mins to work through their work queues.   On the newest hardware we have with these as VMs, it takes under 15 seconds to do the same work.   So for 60 seconds it is doing nothing except checking the hardware clock.

Solution #1:  CPU limits good

We implemented a CPU limiting resource pool for these VBscript VMs.   They are still running mega fast in comparison to where they were a year ago.   Now they are using no more than 8 cores worth at any given time.  A big improvement until the app developers decide if they are going to replace all that code with sleep 60 or recode the entire app.

Case #2:  vCenter SQL Server Memory Limits

Due to a feature in vCenter 4.0U1 and ESX 3.5 Hosts, when I increased the RAM on my vCenter dedicated SQL Server from 4G to 8G, a Memory limit was set of 4G.   When I would go onto the SQL instance, SQL Server.exe would only be using about 3600 Megs yet all 8G was consumed/used.   This screamed to me an issue with the OS instance.   After close to 10 days of head beating and not understanding why my brand new vCenter 4.0U1 system was running so poorly, a co-worker with a fresh set of eyes noticed this setting on the SQL Server instance.

Solution #2:  Memory limits bad

This is obvious.  We disabled the limit and the SQL Server performance went through the roof instantly. We simply couldn’t tell easily that the driver was using 4G of RAM as it wasn’t a process.  Nobody noticed the ballooning happening.

At the end of the day there’s pros and cons to having this level of capabilities.  This is why I like ESX and the general approach of VMware.   Give you everything we can in terms of options, configurations and rope to hang yourself and two of your friends.   We will attempt to automate this and hide this as much as we can.   The Vendor will never know all the situations we, people in the field, are going to run into so let’s give us all the options they can.  Use that rope with caution.

http://www.amazon.com/gp/feature.html/ref=amb_link_86250151_1?ie=UTF8&docId=1000453281&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=right-1&pf_rd_r=1RRCWNZDTV8MFM1WDEGE&pf_rd_t=101&pf_rd_p=503481191&pf_rd_i=163856011

New vmware.com HomePage Layout & View 4 is released

New vmware.com HomePage is now live.   I had an “anonymous internet tipster” give the heads up last night.   Looks good and a bit more sleek in fitting with the branding of the new vmware logo.

Along with that View 4 is finally released.   I’ve been playing with some beta bits for a while now and the PCoIP is pretty impressive catchup with ICA protocol.   I’m looking forward to the mass quantity of comparisons that are going to come out now between ICA & PCoIP.

Running 20008 Active Directory with Bind

One of the fun things about running stuff at home, once you have something working remotely well you sure don’t want to change it.   As such I have setup Bind 9 with DHCP and it works pretty good with a bunch of scripts I have to handle various dynamic adding and removing from my running system.  Every system in this house works off of this base, very stable system.  This works pretty well and there in comes the issue.     I am attempting to bring up an Active Directory Domain for testing a variety of products and as is typical MS, as long as you give all your money to MS, everything works great.   The minute you step out of that paradigm the documentation and functionality tends to falter a bit.

In order to setup Windows 2008 Active Directory domain, the system basically expects you to use MS DNS/DHCP services.   I have no desire to tear down a perfectly functional environment and make it work now with MS DNS.   After a significant amount of digging I found some good webpages on how to setup Bind to work with AD.   None of them worked right though.

  • Bind and Active Directory – didn’t work.   I simply couldn’t get the Windows box to talk to bind for some reason
  • Bind DNS and Active Directory R2 – I kept getting these errors.   No luck down this path.
  • DNS Bind Delegation – This isn’t ideal as now I have to change a significant amount of configuration for DNS searching to work right.  Not the route I want to go down.

So plan D at this point was reading some of the docs on Active Directory and DNS on technet and then turning up named logging and watching for what the future DC was asking for.   Create the domain without passing the network tests and then just run dcdiag /test:DNS a couple dozen times added in each entry as you need to.  Add this into /etc/named.conf and then run service named restart.

logging {

category “default” { “debug”; };
category “general” { “debug”; };
category “database” { “debug”; };
category “security” { “debug”; };
category “config” { “debug”; };
category “resolver” { “debug”; };
category “xfer-in” { “debug”; };
category “xfer-out” { “debug”; };
category “notify” { “debug”; };
category “client” { “debug”; };
category “unmatched” { “debug”; };
category “network” { “debug”; };
category “update” { “debug”; };
category “queries” { “debug”; };
category “dispatch” { “debug”; };
category “dnssec” { “debug”; };
category “lame-servers” { “debug”; };
channel “debug” {
file “/tmp/namedebug” versions 2 size 50m;
print-time yes;
print-category yes;
};

After watching this log file and trying to promote a machine to a DC a couple dozen times and testing this, I found the following DNS entries in your zone are needed.

If my DC is going to have a DNS name of DC.home.here.org with an IP of 192.168.1.4 and the domain is HOME, then these are the entries needed.

home.here.org     A      192.168.1.4
DC.home.here.org      A      192.168.1.4
UID #1 – the subkey in HKLM\Software\Microsoft\Cryptography\AutoEnrollment\AEDirectoryCache
UID #2 – I don’t know where this comes from.  I think it is something Domain related.  It isn’t in the registry.
(These UID based DNS entries might not be needed – not sure)

$ORIGIN _msdcs.home.here.org.
(UID #1 – might not be needed)  CNAME DC.home.here.org.

$ORIGIN _tcp.Default-First-Site-Name._sites.dc._msdcs.home.here.org.
_kerberos       SRV 0 0 88 DC.home.here.org.
_ldap           SRV 0 0 389 DC.home.here.org.

$ORIGIN _tcp.dc._msdcs.home.here.org.
_kerberos       SRV 0 0 88 DC.home.here.org.
_ldap           SRV 0 0 389 DC.home.here.org.

$ORIGIN _msdcs.home.here.org.
_ldap._tcp.(UID#2 – might not be needed).domains SRV 0 0 389 DC.home.here.org.
gc          A   192.168.1.4

$ORIGIN gc._msdcs.home.here.org.
_ldap._tcp.Default-First-Site-Name._sites SRV 0 0 389 DC.home.here.org.
_ldap._tcp      SRV 0 0 389 DC.home.here.org.

$ORIGIN _msdcs.home.here.org.
_ldap._tcp.pdc      SRV 0 0 389 DC.home.here.org.

$ORIGIN _tcp.Default-First-Site-Name._sites.home.here.org.
_gc         SRV 0 0 3268 DC.home.here.org.
_kerberos       SRV 0 0 88 DC.home.here.org.
_ldap           SRV 0 0 389 DC.home.here.org.

$ORIGIN _tcp.home.here.org.
_kerberos       SRV 0 0 88 DC.home.here.org.
_kpasswd        SRV 0 0 464 DC.home.here.org.
_ldap           SRV 0 0 389 DC.home.here.org.

$ORIGIN home.here.org.
_kerberos._udp      SRV 0 0 88 DC
DC         A   192.168.1.4

Enter all these and then try creating your domain again.   This got me up and running.   It would be nice in the spirit of openness to give me an option from dcdiag to dump all the DNS entries the system is looking for and testing.

VI Client protects itself nicely

Ok.. Follow me on this one. 

I am connecting from a laptop via View client to a Virtual Workstation running XP that then I launch VI Client on it and go to the console of my Virtual Workstation.  

VI Client Console
VI Client Console

In the old day VirtualCenter would just loose it’s little mind and crash horribly or do some really funky things with feedback loops.   I like it.    

Hiring a non-tech person for a CTO

Citrix has gone and hired a new CTO

For me I’m seeing another business person at Citrix in charge of the technical direction, not someone that has a strong basis of engineering and technology.

Maybe I’m judging MBA’s harshly, though they are bred and trained to aim for sales and revenue.  Engineering backgrounds aim for better products and solutions.   There is nothing wrong with either and every software business needs both.   For me, it just tells you the focus Citrix has at the top.   It isn’t about the tech, its about the sales first and foremost.

What are your thoughts on a non-scientific or non-engineering based Chief Technical Officer?

VMworld 2009 Trends and Summary

I’ve been watching things finally quiet down from VMworld 2009 and have some of the trends and summaries I have seen.   Some of the trends are interesting, some not so much.

Twitter
Twitter really started the first time last year with following @vmworld.   This year the # was all the rage.   As long as you followed #vmworld you could see everything folks were talking about.   Two good # that were fun this time around were

  • The #vCloud - Take a Drink game.
  • #VMworld3Word – 3 words for folks at VMworld.

I full expect this to only grow next year.

The Virtual Datacenters @ VMworld
Pretty impressive seeing the big one riding down the main escalator at the Moscone Center.   Watch as they build it.   776 VMware ESX Servers, 37 Terabyte internal RAM Memory, 6208 Cores and 348 TB of Shared Storage.  Wow.   Then the talks were how performance wasn’t there initially as the various engineers worked hard at resolving it.   Things were running good by Tuesday Night/Wednesday morning.   The one thing that many of us talked about was how the big data center just looked lopsided.   There was 3 server style racks to each Storage style rack.   The ratio just looked odd to most of us.

Next year I full expect to see one single big data center instead of having small, medium and large ones.   I’m still hoping to get some answers from folks from my initial blog entry.

Booth Babes
A lot more booth babes this year.   I’m not terribly excited about this.   Sure the eye candy is nice.   I’m going to talk with engineers, developers, product managers after wading my way through some marketing folks.   In general if a show is all about the Marketing/Booth Babes (and Guys) then the vendor floor has next to no value for me.

Keynote Lukewarm
Both Keynotes this year didn’t seem to really talk about all the cool stuff coming.   Not sure if this is a new leadership approach or just not much going on this year with the financial slowdown or not.   vCloudExpress stuff was nice though I expected to see more “You Gotta Check out the PCoIP stuff we are doing” and “This is mega cool”.

Vendor / ISV issues
Lots of chats were around the general feeling of hostility coming from VMware to ISV discussions.   Some talk about the rules limiting what Citrix/Microsoft could do and be demoing and shown at the conference.   (Most of the talk was they deserved it for the stuff pulled last year.  Some was a let down that we couldn’t see what they were doing.)  Some of the talk was interactions with ESXi and what was/wasn’t allowed to compete with VMware’s own offerings.  The quote that I heard that best describes it was “Is VMware turning into the Microsoft machine now?

Less Swag, Less People
My guess is the aim was 15k+ people and about 13k came versus last years 14k limit at Vegas.   There was less swag which wasn’t surprising since the financial changes in the past year.

iPhone
I have never seen more iPhones in a single place than in San Fran which was easily 1 in 10.   Then when I went into the Moscone Center for VMworld it was easily 1 in 5.   Crazy nuts what people were doing with their iPhones.   I was introduced to a good 4 dozen apps that I’ve never heard of and now have a good solid set of reasons to get an iPhone.

The other fun was the discussion around the iPhones as a conversation starter of “How many bars do you have?”   Depending on the day and time you’d have anywhere from 0 to 3.   The lucky person was one that could actually hold a phone conversation while in/near the Moscone Center with their iPhone.    Service from AT&T was less than ideal .

Better Bag
The VMworld given bag went back to the style of a true backpack instead of the messenger style.   Personally I like this as my VMworld 2006 with the same style is starting to get a little well used by now.

Live Blogging
This is a skill I am not sure I have though I’ve learned quite a bit by watching, reading and learning how to properly live blog.  If I look to do this again next year I will need to do some reading on different successful ways to do this kind of blogging.   I tried 2 different methods with varying success in my book.   For those of you that read through some of my Keynote Live Blogging posts.. I do apologize and promise to do better.

Overall a good conference again for the time spent with some quality people from VMware, NetApp, Cisco, HP, newScale and all the other individuals I talked with.   I look forward to next year.   See you all there.

VMworld 2009 – Day 2 Wrapup

Day two at VMworld ended up being quite a bit more exciting than yesterday.  The keynote by Steve Herrod was much more what I expected from the keynotes.   He covered some of the “cool” stuff coming down the pipes in both the short term and longer term.   The PCoIP demo showing Google Earth zooming up and down while connected to a machine in Portland, OR from the Moscone Center rocked.   I want to have that to use while I’m sitting in the hotel room’s blazing fast speeds while attempting to do something useful on one of my machines at home instead of using RDP with SSL.  

I went to the IO DRS Tech Preview and got the same excitement I’ve had from previous years where you know your seeing something innovative.  Several of the other sessions I hit were really partner style presentations that did not say much.   So a good 25/75 day for sessions which is pretty good.  

Now that the Self Service Labs were finally working properly I gave a shot at the vCenter Orchestrator product offering.  The Lab was responsive and well documented.  It was pretty nice and really hinted at the power this system can offer for DataCenter Automation.  The theory is this is free with vSphere 4 so I’m going to have to really look into that and find out.

During my open times during the day I had some good meetings with some VMware employees to discuss some of the vStorage & vCloud directions, HP folks around OpenView and Virtualization tools, AMD & Intel on their functionality futures and  Hitachi around their multipathing technology for VMware (still no roadmap).    

The Party was fun.   Foreigner still knows how to rock and I can actually climb rock walls.   The nice thing about the party this year is it was right at the Moscone Center.

A very productive and long day.

EA3196 – Virtualizing BlackBerry Enterprise on VMware

Once again.. another session I didn’t sign up with and zero issues getting into. 

To start off RIM & VMware have been working together for 2 years and it is officially supported on VMware.   Together RIM & VMware have done many numerous and successful engagements running BES on VMware.  The interesting thing is RIM runs their own BES on VMware for over 3 years now. 

Today BES best practice is no more than 1k users per server and they are not very multi-core friendly.   It is not cluster aware or have any HA built in.   The new 5.0 version of BES is coming with some HA availability via replication at the application layer.   One thing that has been seen in various engagements is if you put the BES servers on the same VMware Hosts as virtualized Exchange, there are noticable performance improvements. 

The support options for BES do clearly state that they support on VMware ESX.  

One of the big reasons to virtualize BES is that since it can not use multi-cores effectively the big 32 core boxes today are only able to use a fraction.  By virtualizing BES can get significant consolidation.   Then when doing the virtualization BES gets all the advantages of running virtual such as Test/Dev deployments and server consolidation and HA etc.   Things that are well known and talked about already.  

BES encourages template use to do rapid deployments.   The gotcha is just what your company policies and rules are and can potentially save quite a bit of time.   This presentation is really trying to show how to use VMware/Virtualization with BES for change management improvements, server maintenance, HA, component failures and other base vSphere technologies.   VMware is looking towards using Fault Tolerance for their own BES servers. 

BES is often not considered Tier 1 for DR events.   Even though email is often the biggest thing needed to start working after a DR event to start communications.   The reason is generally been seen due to the complexity and cost of DR. 

The performance testing with the Alliance Team from VMware has been successfully done numerous times for the past couple of years.   They have done testing at both RIM & VMware offices.   The main goal of these efforts was to generate white papers and a reference architectures that are known to work.   The testing was to use Exchange LoadGen & PERK load driver (BES testing driver).  Part of this is how to scale outwith more VMs  as the scale up is known. 

The hardware was 8 cpus, Intel E5450 3Ghz, 16 G RAM and FAS3020 Netapp on vSphere 4 & BES 4.1.6.  The 2k user test with 2 Exchange systems the results were 23% CPU utilization on 2 vCPU BES VMs.   Latency numbers was under 10 ms.   Nothing majorly wrong seen in the testing metrics.   Going from ESX 3.5 to vSphere 4 was a 10-15% CPU reduction in the same workload tests.   Adding in Hardware Assist for Memory saw what looks like another 3-5% reducting in CPU usage.   In their high load testing when doing VMotion there is a small hiccup of about 10% increase in CPU utilization during the cut over period of the VMotion.   This is well within the capacity available on the host and in the Guest OS. 

Their recommendation is to do no more than 2k users on a 2vCPU VM.  If you need more then add more VMs.   Scales and performs well in this scale out architecture.   Be sure you give the storage the number of spindles needed.   The standard statement when talking about virtualization management.  

 The presenter then went into a couple of reference architecture designs.  Small Business & Enterprise with a couple different varieties. 

BES @ VMware.   3 physical locations, 6,500 Exchange users.   1k of them have 5G mailboxes and the default for the rest are 2G.   BES has become pretty common.   They run Exchange 2007 & Windows 2003 for AD & the Guest OS.   Looks fairly straight forward. 

4 prod BES VMS, 1 STandby BES VM, 1 Attachment BES VM and 1 BES dedicated Database VM.   Done on 7 physical servers and 40 additional VM workloads on this cluster.

TA3461 – IO DRS: Tech Preview for VM Performance Isolation

This is a very new area of research at VMware.  Only about 2 years ago.  Since thm is is a Tech Preview it has no roadmap for when it will be available.

The Problem:

Many different workloads hit the same set of disks/arrays/spindles etc.   Low priority processes that run ad-hoc or other times will cause higher priority systems to experience an impact.  What you want to see is that the low priority VM gets less performance than the higher priority systems.   The question is how you can do this?

A solution:  Resource Controls

Assigning out shares based on disk performances.   Just like CPU/Memory shares of the original ESX days.  Higher shares total for a host gets higher priority for that shared VMFS volume.

To configure this you’d go into the VM and set the shares.   Fairly straight forward.   The setting is shares and then the limiting factor is IOPS.   Interesting idea. 

First case study covers two separate hosts with the IO DRS turned on running the same workload levels and saw a pretty significant difference in terms of IOPS & Latency measures.   With it turned off both VMs ran at 20 ms & 1500 IOPS.  With it on the Latency changed to 16 ms and 31 ms and a similar spread for IOPS.   Nice..

Case study two is a a more serious one with SQL server running.   The shares were 4:1 and the ratios were not that in terms of performance.  The thing that they are seeing is that load time matters significantly.  Overall thruput is working right and good though the loads make a big difference.  

The demo went and showed changing the shares on the fly and the Limit for IOPS and watched the IOMeter machines adjust immediately.   When limiting the IOPS the other systems picked up the slack and got more performance.  

After showing the demo the presenters asked if anyone in the packed room (and I do mean PACKED) would find a value to this?   Everyone immediately raised their hands.  

The tech approach is first to detect congestion.   If latency is above a threshold and then trigger the IO DRS.   If it isn’t borked don’t fix it.   IO DRS works by controling the IOs issued per host.  The sum of the vms on the host with IO DRS enabled is compared with other hosts to determine share priority.   So first the host is picked and then the VMs shares on that host are prioritized and then back to the host discussion.   The share control goes against all hosts using that same VMFS volume. 

IO slots are filled based on the shares on each host.  There are so many IO slots per Host.  This is how the IOs are controled for share congestion work.  

Two major performance metrics in storage industry.   Bandwidth (MB/s) and Throughput (IOPS).   Each have their pros and cons.   Bandwidth helps workloads with large IO sizes and IOPS is good for lots of sequential workloads.   IO DRS controls the array queue among the VMs.  Then if a VM has lots of small IOs they can continue to do things and have high IOPS.  Conversely if it has large IOs it is doing then it will get high bandwidth and low IOPS using the same share control system.

Case studies and test runs have shown that Device level latency stays the same as workloads change.  Some tests have shown that with IO DRS IOPS can go up simply due to the workloads involved.  Control of the IOs allows all to work though depending on the workload a VM can accomplish more.  

The key understanding is that IO DRS really helps when there is congestion.   When things are good and latency is not high enough to trigger the system, the shares are not used.   If a high IO share system is not using its slots, they are reassigned to other VMs in the cluster. 

The gain overall is the ability to do performance isolation amoung VMs based on Disk IO.

In the future they are looking to tie this into more vStorage APIs and VMotions and Storage VMotions, IOP reservation potentially etc.

Rocking cool and can’t wait for this to come out.