TA3461 - IO DRS: Tech Preview for VM Performance Isolation

VMware | Sep 2, 2009

This is a very new area of research at VMware. Only about 2 years ago. Since thm is is a Tech Preview it has no roadmap for when it will be available.

The Problem:

Many different workloads hit the same set of disks/arrays/spindles etc. Low priority processes that run ad-hoc or other times will cause higher priority systems to experience an impact. What you want to see is that the low priority VM gets less performance than the higher priority systems. The question is how you can do this?

A solution: Resource Controls

Assigning out shares based on disk performances. Just like CPU/Memory shares of the original ESX days. Higher shares total for a host gets higher priority for that shared VMFS volume.

To configure this you'd go into the VM and set the shares. Fairly straight forward. The setting is shares and then the limiting factor is IOPS. Interesting idea.

First case study covers two separate hosts with the IO DRS turned on running the same workload levels and saw a pretty significant difference in terms of IOPS & Latency measures. With it turned off both VMs ran at 20 ms & 1500 IOPS. With it on the Latency changed to 16 ms and 31 ms and a similar spread for IOPS. Nice..

Case study two is a a more serious one with SQL server running. The shares were 4:1 and the ratios were not that in terms of performance. The thing that they are seeing is that load time matters significantly. Overall thruput is working right and good though the loads make a big difference.

The demo went and showed changing the shares on the fly and the Limit for IOPS and watched the IOMeter machines adjust immediately. When limiting the IOPS the other systems picked up the slack and got more performance.

After showing the demo the presenters asked if anyone in the packed room (and I do mean PACKED) would find a value to this? Everyone immediately raised their hands.

The tech approach is first to detect congestion. If latency is above a threshold and then trigger the IO DRS. If it isn't borked don't fix it. IO DRS works by controling the IOs issued per host. The sum of the vms on the host with IO DRS enabled is compared with other hosts to determine share priority. So first the host is picked and then the VMs shares on that host are prioritized and then back to the host discussion. The share control goes against all hosts using that same VMFS volume.

IO slots are filled based on the shares on each host. There are so many IO slots per Host. This is how the IOs are controled for share congestion work.

Two major performance metrics in storage industry. Bandwidth (MB/s) and Throughput (IOPS). Each have their pros and cons. Bandwidth helps workloads with large IO sizes and IOPS is good for lots of sequential workloads. IO DRS controls the array queue among the VMs. Then if a VM has lots of small IOs they can continue to do things and have high IOPS. Conversely if it has large IOs it is doing then it will get high bandwidth and low IOPS using the same share control system.

Case studies and test runs have shown that Device level latency stays the same as workloads change. Some tests have shown that with IO DRS IOPS can go up simply due to the workloads involved. Control of the IOs allows all to work though depending on the workload a VM can accomplish more.

The key understanding is that IO DRS really helps when there is congestion. When things are good and latency is not high enough to trigger the system, the shares are not used. If a high IO share system is not using its slots, they are reassigned to other VMs in the cluster.

The gain overall is the ability to do performance isolation amoung VMs based on Disk IO.

In the future they are looking to tie this into more vStorage APIs and VMotions and Storage VMotions, IOP reservation potentially etc.

Rocking cool and can't wait for this to come out.

VMworld 2009 - Keynote P5

VMware | Sep 2, 2009

To expand and handle the next layers of Virtualization is:

vSphere Control:

Appspeed: is the "finger of blame" now. Instead of Network always getting the finger, now AppSpeed can point the finger at someone else.

vApps are the containers of the future for applications be it standalone or multi-tier. The idea is with a vApp is that it has a variety of attributes/metadata such as Availability, RTOs for dR, Max Latency etc. This info travels with the vApp.

VMsafe APIs: This gives control of security and compliance. The nice thing is this is more appropriate data tied to a vApp via the attributes/metadata and the various vendors such as Trend/McAfee/Symantec/RSA etc. Example would be Needs these firewall rules and capabilities.

vCenter ConfigControl: The demo showed that ConfigControl really has

vSphere Choice:

LabManager is the token self service portal today.

VMworld today:

37,248 machines -

if physical --> 25 MegaWatts - 3 football fields of space

with VMware Virtualization - Down to 776 physical servers running 540 Kilowatts

vCloud

Priority is around the internal cloud. Next is working on bringing internal datacenter trust and capabilities to the external clouds. The 3rd innovation is how and what can you do once you have these two pieces and how they interact and connectivity.

Today Site Recovery Manager is the first step into the Connectivity space. When and how and what needs to take place to failover from one datacenter to another one.

Long Distance VMotion: The challenges - Move VMs Memory, Disk consistency/syncing and VM network id/connections.

Follow the Sun/Moon approaches (moving computing to stay during the night and cheaper issues)
Disaster Datacenter Avoidance - Hurricane coming. Move the Datacenter somewhere out of the path.

Cisco does this by spanning Layer 2 across both campuses up to 300KM apart.
F5 uses its iSession technology to move things around through a globally based load balancer system.

Interoperability: vCloud API

vSphere Plugins with your hosting provider to maintain the Single Pane of Glass.

Open Standards. The end goal is it will work regardless of where you go or what hypervizor is used. The end goal is to have a good eco system and selection for end clients.

vApps Automation for the app stacks. Spring Source helps go down this path. Much discussion around splitting up Infrastructure, Applications, Platform and separating these to create well defined interaction points.

Spring Source Demo shows some of the process capabilities to control deployment and put some controls around it. Things like CloudFoundry. For those of us the contest is on.. http://www.code2cloud.com for backstage pass to see Foreigner. (Oh wait.. maybe I shouldn't post that)

Till the next time. I'm off to IO DRS Tech Preview.

VMworld 2009 - Keynote P4

VMware | Sep 2, 2009

vSphere is the basis of all the improvements and technology over the years. Based on Software Mainframe (for those of you over 40), the Cloud (for the under 40 crowd) and decides the best idea is to call it The Giant Computer. The reason this all works is because of VMotion. It is the basis of all that has happened.

The reason for the success of VMotion is Maturity, Breadth, Automated Use.

Maturity of VMotion - Estimates (fun or not) put around 360 million VMotions around the world since VMotion started. About 2 VMotions a second around the world. VMotion is 6 years old. (Wow I feel old)

Breadth of VMotion - Storage & Network VMotioning. Across protocols and soon across Datacenters. High performance computing systems are starting to look at using VMware.

Automation of VMotion - DRS is the initial version that made this work. DRS has been shown to average 96% of a perfect performance environment compared to a manually setup cluster in a perfect world. Future will include IO DRS shares and configuration based on IOPS. DPM allows for power optimization across the datacenter. Or as has been said a Server Defrag capability.

vSphere is still driving ahead.. more next post.

VMworld 2009 - Keynote P3

VMware | Sep 2, 2009

View also includes the Mobile Technology dicussion. Mobile Technology is longer term working for functionality. Visa Product Development is up on the stage. He sees this space as a huge innovation going forward. Current development is significantly complicated. Easing functionality for development is extremely interesting for Visa.

The Visa demo uses Windows Mobile on a developer version of a phone (kinda big) running an Atom CPU. The presentation shows some alerting from Visa transactions and finding local ATMs. The impressive zing is that the Visa demo application is actually an Android app running on the Atom CPU. Wow.

Next..

VMworld 2009 - Keynote P2

VMware | Sep 2, 2009

A major goal of the View initative is to have the same image while providing the best experience possible with WAN, LAN and Direct machine speeds. For WAN/LAN the solution will be PCoIP. The performance numbers are very impressive and no numbers. This protocol has shown some excellent capabilities over WAN connections.

The other piece for local machine usage is Employee Owned machines. Hosted Virtualization is being highly developed. Deals with Intel have gone the next step with Bare Metal Virtualization for Corporate owned machines.

Demo of the Bare Metal Virtualization (type 1 hypervisor). Direct3D works fairly well during the demo. A presentation of OpenGL using the Google Earth demo over PCoIP and over the LAN was very nice. The WAN demo back to Portland simply rocked.

Wyse has an iPhone application to make the iPhone act as a thin client connecting over PCoIP to the same virtual machine ruled. Quick and effectively to scroll around the screen and do what you would normally. Which is appropriate having seen well over 2 out of 10 people having iPhones, more than Blackberrys here.

More to come.. next post...

Older Newer

Its just another layerhow deep does the abstraction go?

TA3461 - IO DRS: Tech Preview for VM Performance Isolation

VMworld 2009 - Keynote P5

VMworld 2009 - Keynote P4

VMworld 2009 - Keynote P3

VMworld 2009 - Keynote P2