Un-architecting VDI with XtremIO

Bottom line:  What if you had a storage solution that could run Persistent or Non-Persistent desktops, with better performance than you have today, at scale, and do it for less than pay now for your non-persistent virtual desktops. I know that this sounds like crazy-talk.  Stick with me and you’ll be onboard.

Note:  This is all about the VDI side of XtremIO…why would a desktop engineer care about a storage platform?   If you want to learn more about the nuts and bolts of XtremIO…check out a Itzik Reich’s awesome blog HERE, or check out Chad’s Blog on the topic HERE.

VDI has gotten WAY too complicated…why?

For the last several years, as the market (customers and vendors) have tried to figure out where to go with VDI, there have been several pretty significant barriers to large scale adoption of VDI…and storage cost and performance has been one of the biggest.

Desktop engineers are challenged with the benefits of VDI, such as mobility, security and performance and having to balance that against the ultimate measurement:  Cost Per Desktop.  When the VDI design is first conceived, it’s great…but then reality sets in as the cost for the architecture is realized…and then the promises of OPEX savings aren’t there because we have to make the VDI environment so complicated in an effort to control the CAPEX.

This article is all about a new way to UN-Architect your VDI design to make it simpler to manage, make it run faster and with more consistency, and for a lower price point than you have seen before.

First, let’s walk through how we got here in the first place…

What stinks about storage and VDI?

Performance drives user experience…and drives up cost

In order to give users the same or better IOPS than they have at their desktops, we need to at least emulate the performance of a 5400RPM drive.  Well, that’s not completely true…more and more, the standard of performance at the user’s desktop has risen to SSD drives, which offer even MORE performance.  So, when we attempt to move users over to a centralized virtual desktop, the OS and Applications that they need to run, there is a certain expectation of performance that has to be met…or the riots will begin and adoption will fail.

So, how do we get to performance in todays arrays?

Add lots of spindles.  Straight math, each drive has an IOPS capability, multiply that times a number of spindles, subtract performance for RAID overhead, and you reach a consistently available performance profile.  Now, when we apply that math to a general VDI environment, where we have to size for Steady State AND IO Storms (like Boot Storms, Login Storms, AV Storms, etc) we end up with a HUGE number of spindles to be able to guarantee the expected performance back to the end users.  This model is really expensive, takes up a lot of space, and really tends to either be way oversized…or undersized…and very hard to hit the bullseye.

Add lots of cache.  So, how do we attempt to reduce the footprint in a way that will also reduce cost, and increase performance and user experience at the same time?  First, we should understand that Windows desktops have some pretty consistent performance attributes.  All of the Enterprise arrays on the market offer DRAM cache that can help to also offload both reads and writes from the spinning media and improve user experience.  However, during a day-long cycle of end users pounding away at the VDI desktops, eventually the DRAM cache fills up and we HAVE to de-stage the writes off to disk.

Add intelligent caching.  In a great article (http://myvirtualcloud.net/?p=2502) , Andre Leivbovici does a great job in laying out why putting in some FAST Cache from EMC in front of the spinning disk was able to offload many of the IOPS for both read and write operations.   FAST Cache takes advantage of the Windows IO attribute where only a small percentage (usually around 10%) of the capacity requires the majority (usually around 90%) of the IOPS.  A great way to think about this is that your PowerPoint help files would not require 30 IOPS, but your Windows System Files and primary applications could.  FAST Cache helps to overcome this.  This is great for the EMC customer who wants to run Linked Clones (Composer, PVS, MCS) and can be seen in several of our reference architectures that can be found at https://community.emc.com/docs/DOC-14069.

When we start to apply those same principles to fully-provisioned or persistent desktops, we can still get performance improvements, however, we still need a lot more capacity and performance to keep up the user experience.  Now, we end up back at the problem of way too many disks, too many arrays, and a lot of operational caveats that make it a lot harder to manage the VDI storage.

Offloading Caching and Tiering.  All Flash arrays have been making some noise in the market, and offer a path to screaming performance for a given set of data, and the sheer IOPS that these arrays are able to achieve have some benefits to VDI.  However, there are some challenges to that model.  Flash drive cost:capacity ratios make the all-flash solution difficult to scale out for any large scale VDI use cases.  Also, the lack of platform scalablity and enterprise features like VMware VAAI integration, backup and replication integration, and uptime capabilities relegate most of those solutions to very specific linked-clone specific use cases.

SuperFast All-Flash Arrays.  There are a handful of vendors on the market today with All-Flash arrays.  These arrays are fantastic at pushing gobs of IOPs between the server and the disk.  When you need pure performance, these solutions offer that. While the benefit of these solutions is tied to raw performance, there is quickly a challenge with cost vs. capacity.  The All-Flash arrays are not typically (and this changes all the time) setup for scale-out, and they are not focused on providing enterprise class features, such as VMware VAAI integration, Snapshots, Replication, and data protection capabilities.  Useful in some situations, the capacity and data protection limitations of these arrays limit their usefulness in all but certain fringe use cases within the enterprise.

What else stinks about VDI storage?

In an effort to reduce the cost and increase performance, we layer on additional solutions…

Linked Clones.  Whether the technology is View Composer, Citrix PVS or MCS, or linked-clones at the array level, the concept is the same:  Centralize the reads into a “master” or “gold” image, and have a “linked-clone” that is much smaller where the individual writes that each desktop generates can go.  In this way, you shrink the capacity requirements and better yet, you also get to do some pretty interesting things with the storage configurations underneath this architecture to improve and provide more consistent performance.

The industry has sung this same song for the last 6 years as the only way to really drive cost out of the VDI environment and make them easier to manage.  However, the challenge that all customers have had with Linked-Clones or Non-Persistent desktops has been that in order to use them, you have to change too many things related to how they are managed, how applications are deployed, and how users can remain productive on them.  The non-persistent desktop isn’t bad…but in an effort to drive out cost, you end up with a much more complicated design…and one that ultimately causes most orgs to just give up…or struggle to figure out how to make it work.

Deduplication.  Deduplication is also one way that some vendors have brought to the market as a way to drive down the cost of the VDI storage infrastructure.  By reducing the common blocks (of which there are many) in a VDI environment, without having to use “linked-clones” you can keep your desktops persistent, but take up much less space.

To date, deduplication at the array level has meant a significant investment in higher-end arrays that can handle the processing requirements of deduplication and some operational caveats around how to handle a post-process deduplication.  The use of deduplication hasn’t solved the performance issue, though.  In the standard array configurations, we have to throw lots of disk at a persistent desktop because of the performance requirements…not as much for the capacity requirements.  We are performance bound.

Local Caching and Flash.  VMware released the View Storage Accelerator this year as a mechanism to use local RAM on the ESXi servers to cache the most frequently read blocks.  Citrix leverages a similar technology called Intellicache.  These soltuions, and other hardware solutions that also exist at this level are great at reducing the IO workload between the server and the array.  In a great review article found HERE , the author demonstrates how some RAM in the server can dramatically increase performance of the desktops.  However, also of note…this only works with linked clones…and is only supported if you don’t use any tiering on the backend.  Basically…it’s awesome if your use case fits the support and implementation model.

So, what have we covered so far…

  1. VDI Storage is costly when we focus on a consistent and positive user experience
  2. Solutions exist to help offset some of this cost, but they typically come with some pretty hefty strings attached

Introducing EMC XtremIO…

What if you had a VDI storage solution that…

  • removed all of the performance constraints of traditional spinning media…but also had inline/immediate deduplication capabilities that didn’t affect that performance.
  • had all of the data protection, replication, and uptime attributes that you expect from your enterprise storage solutions already.
  • started small, but could easily grow to meet your expanding VDI storage needs…so the upfront investment could be smaller…and allow more granular growth.
  • was so easy to manage, that it defied everything you had seen before in managing enterprise arrays.

You’ve just been introduced to EMC XtremIO.

EMC XtremIO is an All-Flash array that was built from the ground up to support your entperprise storage requirements as a PRIMARY storage solution…and to leave behind the legacy array traditions that were defined around spinning disk.

What makes this different than other All-Flash arrays?

In-Line Deduplication.  Deduplication is the foundation of the array.  This is not a service that can be turned on or off.  Every block that comes into the array, whether it’s a single 5U node or a cluster of up to 8 nodes, is deduplicated BEFORE being written to disk.  This accomplished to things that are HUGE:

What’s FASTER that flash?  CPU and NVRAM.  While flash is faster that spinning media, if I never have to make the write to the disk in the first place, but just update some metadata when I’ve already seen that block, it’s faster than flash.  Performance SOARS.

Flash Reliability goes WAY up.  This is a bit further into the weeds, but, all flash drives have a limited capability to handle multiple writes over long periods of time.  It’s referred to as Write Endurance.  See THIS article for a start describing the problem.  With deduplication happening BEFORE the write occurs, the number of writes over the lifetime of each drive in the array is dropped, and the reliability and lifetime of the drives goes WAY up.

So what does this REALLY mean to my VDI design?

Think of all of the time and effort that you, your teams, and your partners (and vendors) have put into trying to figure out how to do non-persistent VDI.  Think of the testing of Application Virtualization, Roaming Profiles, Anti-Virus Updates, and Patching that has been done to try and figure out how to make non-persistent VDI work in your environment.

Now, consider that with EMC XtremIO, there’s no cost or performance difference between running a Persistent or Non-Persistent desktop.  In fact, you probably spend more on infrastructure and licensing to get a Non-Persistent desktop running than you would a Persistent desktop.

The fallacy of Non-Persistent desktop models was that they were going to save us money by reducing the management cost of the desktop.  In some cases, where they can be deployed in large scale, that is still true.  Think of a call center with hundreds or thousands of desktops that are identical.  That’s a great fit for a non-persistent desktop.

Enterprise-Class Feature Set.  While this one might not be as interesting or compelling to the desktop team, it’s going to really mean something to the storage and enterprise risk management folks.  Persistent desktops are persistent for a reason.  Users need to save data and applications to their desktops.  This data must be protected.

Just like any other data-set within the data center, a priority should be assigned to this data based on your organizations requirements.  EMC XtremIO provides with the ability to perform Snapshots and Clones based primarily on the deduplication mechanism.

From these SLAs are defined and a data protection  and both replication to a DR site and longer-term backup of this data should also be considered.  These are enterprise class features that are not available on many of the all-flash/high-performance alternatives.

Think about the desktop management solutions and processes that you have for your physical desktops today.  They aren’t going away.  No organization can completely remove the requirement for non-persistent desktops.  In the same way…no organization can completely remove the requirement for physical desktops.  There will always be users who need physical and offline access to their desktops.

So, what does this mean?  If you could have managed your VDI desktops the same way you manage your physical desktops, and keep the cost to a minimum, you would have done that already.  While many orgs have deployed large numbers of persistent VDI desktops out of necessity, there is always the realization that this was the most expensive way to deploy a desktop.

I’ve worked with a number of enterprise customers who fit that description.  And each one of those customers KNOWS that non-persistent desktops would lower the cost.  Each customer is ALSO aware, that their end users could not be satisfied or productive given the constraints of a non-persistent desktop.  The cost, complexity, and risk of deploying a non-persistent desktop is just too high.

With EMC XtremIO, much of that goes away.  Desktop engineers can take a second look at how they optimize the desktops.  The fear of swapping and the negative performance impact that swapping is known for is also reduced.  Removing functionality from a Windows desktop in favor of disk performance changes.

Now, there will always be some differences between a VDI and a physical desktop.  They aren’t the same.  Running a desktop over a network has limitations.  Sharing CPUs on ESXi servers will have limitations.  Doing a P2V of a physical desktop into a virtual desktop is still not good practice.

If you thought that was cool…consider that in the near future you’ll be able to add more EMC goodness to the solution to make it even better…

  • PowerPath/VE…optimize all of that performance for the best possible performance, removing another layer of the onion.
  • RecoverPoint…Critical to protecting your precious persistent desktops, watch for Native RecoverPoint integration…totally integrated with VMware Site Recovery Manager.

Oh, and if you’ve come to love the EMC and VMware integration story…well, you KNOW that XtremIO is the next chapter in that story.

More to come on this great topic…and I’m sure I won’t be the only one making noise.