Atlantis ILIO – RAM Based storage matched for VDI

I personally am very fond of solutions that handle IO close to the source and therefor give more performance to your virtual machine workload and minimize (or preferably skip) the storage footprint downstream. I previously written a blog post about sollutions you can use at the host. One of these solutions is Atlantis ILIO.
As the company I currently work for (Qwise – is also a partner for consulting on and delivering Atlantis ILIO solutions, I thought one plus one is… three. 

If you’re not familiar with Atlantis ILIO, it works with running an Atlantis appliance (VSA) on each of your hypervisor hosts (dedicated for VDI for example) and presenting a NFS or iSCSI data store that all the VMs on that host use. For this data store it uses a configured part of the hosts RAM to handle all reads and writes directly from this hosts RAM (that is when you let the VM deploy here and you have reserved this RAM for this kind of usage). The IO traffic is first analyzed by Atlantis to reduce the amount of IO, then the data is de-duplicated and compressed before being written to server RAM. When needed Atlantis ILIO converts small random IO blocks into larger blocks of sequential IO before sending to storage, increasing storage and desktop performance. This is the IO Blender Effect.
The OS footprint is minimized to a rather small one in RAM, numbers of 90% percent can be reached depending on the type of workload. Any data that will be written to the external storage (outside of RAM) also undergoes write coalescing before it is written. 

Since Atlantis will only store each data block once, regardless of how many VMs on that host use that block, you can run dozens or hundreds of VMs of just a tiny amount of memory.

And what does RAM gives? A warp speed user experience and faster deployment.

Atlantis ILIO can be used for stateless VDI (completely in RAM), persistent VDI (out of server memory or shared storage backed), XenApp and can also be used with virtual server infrastructures.

Atlantis ILIO Architecture


Like written before Atlantis ILIO is deployed as an appliance on each host or on a host that serves a complete rack. This appliance is an Atlantis ILIO (or ILIO for short) controller or instance. The Atlantis ILIO appliance uses a defined part of the host it’s RAM to present a NFS or iSCSI datastore via the hypervisor. Here you can place the VD’s, XenApp or other needed to accelerate VM’s. ILIO sits in the IO stream of your VM, hypervisor and storage. You need the correct Atlantis product to use the optimized features for the wanted solution workload, currently VDI and XenApp. Keep an eye out for other servers solutions, there bound to come out this half of 2014.
In above model the hypervisor is VMware vSphere with a stateless VDI deployment, but this can be Citrix XenServer or Microsoft Hyper-V as Atlantis supports these also. The Atlantis presented storage can be easily used to accelerate PVS or MCS for using XenDesktop provisioning. Or in combination with some form of local or shared storage for persistent desktops unique user data.

Atlantis ILIO Management Center.

The Atlantis ILIO Management Center will setup, discover and manage one or more Atlantis ILIO instances. The ILIO Center is a virtual appliance that is registered with a VMware vCenter cluster. Once ILIO Center is registered with a vCenter, ILIO Center can discover Atlantis ILIO instances that are in the same vCenter management cluster and selectively install a Management Agent on Atlantis ILIO instances. If additional vCenter clusters with Atlantis ILIO instances exist, then an ILIO Center virtual machine can be created and registered for each cluster.
The ILIO center can be used for provisioning of ILIO instances, monitoring and alerting, maintenance (patching and updates) and for (probably the most importing part) reporting of the status of the ILIO proces and handled IO offload (for example what amount of blocks is de-duped). The ILIO center can also be used to fast clone a VD image. This clones full desktop VM’s in as little as 5 seconds without network or storage traffic.

Hosts and High Availability (mainly for persistent deployments)

Atlantis supports creating a synchronous FT cluster of Atlantis ILIO virtual machines on different hosts to provide resiliency and “zero downtime” during hardware failures. Atlantis supports using HA across multiple hosts or automatically restarting virtual machines on the same host. 

A host that is offering resources to a specific workload, for example the VD’s, this is called a session host. This session host can use local or share storage for it’s unique data storage. With shared storage when a failure happens you can use the hypervisor HA (together with DRS VM rule to keep appliance and VD’s together). When using local storage in vSphere this is not an option as HA requires a form of shared storage. For this you can use ILIO clustering with replication.

With the availability of unique local host data a replication and a standby host come into the picture.

In a Atlantis ILIO persistent VD solution, a replication host is a centrally placed Atlantis ILIO instance that maintains the master copy of each user’s unique data. The desktop reads and writes its IO on the RAM of the session host. The session host (after handling the IO) then replicates any unique compressed data over to the replication host. This replication is called Fast Replication. The replication is handled over the internal out-of-band Atlantis ILIO network. The replication host is shared storage backed where the unique user data is written. There is also a standby host that is a standby for the replication host. This standby host has the same access to shared storage location as the replication host. In case the replication host fails the standby host takes over and has access to the same unique user data on the share storage. Keep in mind that, depending upon your workload, between 5 and 8 session hosts can share a single replication host. 
Disk-backed configurations that leverage external shared storage do not need a Replication Host as ILIO Fast Replication mirrors the desktop data directly from the external storage to this shared storage

For non persistent stateless VD’s the data stays purely in RAM. VMware Horizon View or Citrix XenDesktop will notice that the VD’s are down when the hosts fails, and will make new VD’s available at an other host. Users will temporarily experience a disconnect but their workspaces will reconnect when available again.


With interesting RAM pricing and reducing infrastructure complexity Atlantis ILIO is the perfect solution to use in (re)building VD infrastructure with already in place solutions or solution components. You can provision lighting fast VD’s and engage your workforce to warp speed productivity) with a user cost around 150-200 euro. You can session hunderd of VD’s with just a small amount of configured RAM on one host. Next to this you will have a much smaller unique IO data footprint on your shared storage. No need to go with expensive accelarated storage infrastructure controllers. You can easily go with a cheaper SAN/NAS/JBOD or a No SAN solution.

– Happy Atlantis ILIO’ing!

Reducing IO @ the host

Workloads need IO resources to operate correctly and with good response in an information chain. No performance is many unhappy users and business processes grinding to a halt.

IO can be networking and storage in this case. As networking IO mostly is less of a problem (but the main line goes for networking IO also), main focus of this blog post is storage IO.

In my opinion it is important to handle or reduce the IO closest to the source before letting it get down to other infrastructure components. The effect should be that the IO downstream can be handled by a smaller simpler storage device or preferably none.
The first side effect of this is that it cuts in the complexity of the infrastructure, costs of handling IO at the storage layer and simplifying project tasks as they won’t need a over-tasked storage engineer or wait for parts to come in (and take the thing partly down and figuring that controllers where not failover tested that much). To put it simple, cuts cost and time.
The second side effect of this is that it improve VM to host consolidation ratio’s. Depending on the sort of VM workload this can be an advantage. With virtual desktops (VD’s) as they will be part of much to same IO workloads (Windows 7 images) to effectiveness of some of the host based solutions will increase.

Most important (unfortunately not seen that often) is knowing your organizations workloads and their IO requirements. VDI is commonly write intensive so trying to reducing read isn’t going to give that much and your storage will still be hit hard. Other application parts are more read heavier. Etc.

What are our options that I have come across?

  1. Reduce guest IO footprint – OS. Optimize your guest OS so that it needs less CPU, memory and IO. Less to a certain amount that is. If the workload is letting your OS swap to disk because of low memory, IO on the storage will be heavier then sizing the memory to include the workload (and stop swapping). Know the correct sizing of your organizations image. Optimization of Windows can be done via the several optimization guides available (eg. Windows 7 via VMware or Citrix or with tools like VMware OS Optimization Tool Fling
  2. Reducing guest IO footprint – Virus protection. Moving the virus protection from the guest to the Hypervisor layer by using McAfee MOVE or Trendmicro Deepsecurity. These products use a hosts based virtual appliance to plug directly to the hosts hypervisor (by using vShield for example). Seriously reducing CPU cycles, memory consumption and storage IO coming from the guest.
  3. Reducing VM IO by offloading swap. Benefits of offloading swap files to host local (flash) storage is the reduction of the space footprint and to offload read and write IO’s to shared storage on to local storage (which in turn must be able to handle the amount of IO, preferably via SSD).
  4. Reducing VD storage requirements by using composer linked clones or PVS vdisks. With View Composer you create desktop images that share virtual disks with a base image, so you can reduce the required storage capacity.
    View Composer uses a base image, or parent virtual machine, and creates a pool of up to 1,000 linked-clone virtual machines. Each linked clone acts like an independent desktop, yet the linked clone requires significantly less storage. When placing the clones disks on local accelarated storage even more response can be offered to the desktop.
  5. Host caching and deduplication. Several so called accelerators are available to cache IO on flash or RAM, and are able to do inline data deduplication. They work at the hypervisor level by introducing a virtual appliance or hypervisor module, which can be clustered for fault tolerance. These solutions give your workloads more IOPS at lower latency (milli to micro). They can be shared storage backed, but with lower requirements and mainly for capacity. I’m thinking about Atantis ILIO, Infinio and PernixData. Cost effective solutions for better responses.
  6. IO handling at host cluster – Virtual SAN or VSAN. Radical simple storage from pools (cluster) of vSphere hosts. VSAN uses flash disks (SSD) as a read cache and write buffer. The read cache keeps a list of commonly accessed disk blocks to reduce I/O read latency in the event of a cache hit (that is, the disk block is in cache). The write cache behaves as a non-volatile write buffer, reducing latency for write operations as well.
    In the event of a host failure a replicated copy of cache, as well as the replicated disk data are available on one or more VSAN cluster hosts. See more at: Unfortuanally in Beta, so not production worthy yet.
  7. IO accelerator cards – FusionIO or HP Accelarator cards. Accelerating flash with PCI Express Cards. Integrates with servers at the system bus and kernel level. Lot’s of IO performance with minimal power consumption. Gives more IO then SSD. Great for large data centers. Costly solutions but offers very much IOPS at the host level.
  8. (Hyper)converged architectures. All your Infrastructure or data center resources in a box with a single management layer. No need for complexity. Easily scalable (it is just like Lego with those blocks), great for starting small and grow incrementally when needed. Includes storage features (depending on system that is) such as flash acceleration on several layers (SSD, PCIe and such), deduplication, compression and replication all at the hosts/block level. Thinking about Nutanix or SimpliVity here.

Of course there will be other solutions out there. Like written before, these are just to ones I have come across. Know of some that are certainly be bound to be in this post, drop a comment and I will take a peek to include them.

– Happy reducing IO @ the host.

vSphere Performance monitoring tools standard available

I am currently working on a project where we are optimizing the virtual infrastructure which consist of vSphere and XenServer hypervisors. In the project we want to measure and confirm some of the performance related counters. We got several standard tools at the infrastructure components to see what the environment is capable of and check if there are some bottlenecks regarding IO flow and processing. 

With any of the analyzing it is important to plan (or know) what to measure on what layer so this is repeatable when wanting to check what certain changes can do to your environment. This check can also be done from some of the tools available, such as earlier written in the blog post about VMware View planner (to be checked at this url or is a repeat of your plan (which then can be automated/orchestrated). Your measuring tools need to have similar counters/metric throughout the chain, or at least show what your putting/requesting from a start and at the end (but if there is a offset you got little grey spots in the chain).
A correct working time service (NTP) is next to correct working of for example clustering and logging, also necessary for monitoring. To get to right values at the right intervals. Slightly off will in some cases give you negative or values that are off at some components.

Some basics about measuring

You will have to know what the measuring metrics are at a point. Some are integers, some are floating, some are averages over periods or amounts used, some need a algorithm to calculate to human or a similar metric (Kb at one level and bytes on the other, some of them are not that easy). A value that is high in first view but consists of several components and is an average of a certain period, could be normal when devided by the amounts of worlds.

Next up know or decide on your period and data collection intervals. If you are measuring every second you probably get a lot of information and are a busy man (or woman) trying to analyze trough all the data. Measuring in December gives a less representative workload then measuring in a company’s peak February period (and for Santa it is the other way around ;-)). And measure the complete proces cycle, try to get a 4 weeks/month period to get the month opening and closing processes in there (well depending on the workload of course).

Most important is that you know what your workloads are, what the needs for IO is and what your facilitating networking and storage components are capable off. If you don’t know what your VD image is build of for a certain group of users and what is required for these, how will you know if a VD from this groups requesting 45 IOPS is good or bad. At the other hand if you put all your management, infrastructure and VD’s on the same storage how are you going separate the cumulative counters from the specific workload.

Hey you said something about vSphere in the title, let’s see what is standard available for the vSphere level.

VM monitoring. In guest Windows Perfmon counters or Linux guest statistics. The last is highly depending on what you put in your distribution, but think of top, htop, atop, vmstat, mpstat et al.
Windows Perfmon counters are supplemented with some VM insights with VMware tools. There are a lot of counters available, so know what you want to measure. And use the data collection sets to group them and have them for reference/repeatable sets (scheduling of the data collection). 

– Host level; esxtop or vscsistats. Esxtop is great tool for performance analysis of all types. Duncan Epping has an excellent post about esxtop metrics and usage, you can find it here Esxtop can be used in interactive or batch mode. With the batch mode you can load you output file in Windows Perf mon or in esxplot ( Use VisualESXtop ( for enhancements to the esxtop commandline and a nice GUI. On the VMA you can use resxtop to remotely get the esxtop stats. vscsistats is used when wanting to get scsi collections or get storage information that esxtop is not capable of showing. And ofcourse PowerCLI can be an enormous help here.

vCenter level; Statistics collection which depends on your statistics level. Graphs can be shown on several components in the vSphere Web Client, can be read via the vSphere API or again use PowerCLI to extract the wanted counters. To get an overview of metrics at the levels please read this document or check documentation center for your version.

– vCenter™ Operations Management Suite (vCOPS). Well standard, you still have to option to not include operations in your environment. But your missing out on some of the automated (interactive/proactive) performance monitoring, reporting and insight in your environment options. Root cause analysis is part of the suite, and not down to your own understanding and analytic skills. If you are working on the previous levels your life could have been simpler with vCOPS suite.

Next up

These standard tools need to be supplemented with specific application, networking (hops and other passed components) and storage (what are the storage processors up to is there latency build up in the device it self) counters.

– Happy measuring!

Virtual Infrastructure WAN Benchmarking with Dummynet

Ever had a customer environment where either a deployment is done over WAN links or the customers wants to change there WAN links, and they (or you in planning or testing phase) are interesting to see what happens in advance? Or you just want to know how latency or ping drops in a WAN link influences user experience in VDI or application deployments (well if the users aren’t complaining first ;-)…)?

Introduce some real world networking examples to your lab set-up?

You will have some opportunities in the virtual infrastructure, vSphere for example let’s you throttle a vSwitch/portgroup to a certain peak bandwidth (traffic shaping). But bandwidth is just a part of the deal, what about what is latency doing to your desktop or application experience, what will ping drops do to the communication?

For these sort of testing/benchmarking I often use a Dummynet setup to influence the traffic between infrastructure components. So how does this work? Time to find out in this blog post.

What is Dummynet?

Start at the start you dummy…..Dummynet is a traffic shaper, bandwidth manager and delay emulator and is included in FreeBSD. There are several ways to use this, deploying a VM with a FreeBSD image or with some of the live ISO’s out there (Frenzy for example ). Dummynet was initially implemented as a testing tool for TCP congestion control by Luigi Rizzo <[email protected]>. See Luigi’s site Dummynet using ipfw to influence traffic flowing through the created network tunnel, 

How to use Dummynet in your virtual infrastructure?

Build a Dummynet VM with two networks connected. Connect a client VM and a server VM on one of the networks (client in network X and server in network Y). Let the dummynet route and pipe the network traffic between those VM and shape the network accordantly to your testing needs.

A model to help clarify.


When installing FreeBSD be sure to include src for the ability to include Dummynet firewall in the kernel. You can either use a kernel load or recompile a kernel with the dummynet option included.


I’m putting a client VM on the same network as one of the dummynet interfaces. The second dummynet interface is connected to an other VM network where the server VM is connected. The VM’s are given IP addresses in their respective IP subnets and the dummynet VM is given IP in these as well (on the em0 for the client VM subnet en on the em1 for the server VM subnet). I configure the client and server VM’s to use the Dummynet VM as their IP gateway (just a route add for there subnet pointing to the dummynet interface).

On the dummynet VM you can use the following commands or included them in the rc.conf and /etc/sysctl.conf files.

ifconfig em0 netmask
ifconfig em1 netmask
sysctl net.inet.ip.forwarding=1 (Tell FreeBSD to forward packets between the two IP addresses)

Check if your can reach the both VM’s by pinging there IP adresses from the dummynet host.


Yes? Okay move on. Check if you can reach from one to the other VM via the ip forwarding option. The freebsd-server is on the subnet. So first add the route like said before.


This works. Now introduce some dummynet.

kldload dummynet
ipfw flush

Add a firewall rule to allow all traffic from and to the first vm to the second.

ipfw add 1000 allow all from any to any

Add a Dummynet pipe to check if ipfw works:
ipfw add 100 pipe 1 ip from any to any

And put some delay on that:

ipfw pipe 1 config delay 10ms

And check with ping:


You will see the delay multiplied by how many times to traffic passes the pipe. From 2.5 / 3ms in previous shot to 39/38 ms in current shot.

When this works you can add some other test for example:

Add a rule to delay selected packets by 50 ms, randomly drop 30% (0 is for no loss and 1 is for 100% packet loss) of the packets and limit the bandwidth to 1Mbps (bandwidth can be checked by a copy or such.)

ipfw pipe 1 config delay 50ms plr .03 bw 1024Kbits/s


Sequence 3 is dropped from the packets.

To see what is configured on the pipe use:

ipfw pipe 1 show and destroy with ipfw pipe 1 delete.

This concludes this blog post.

– Happy Dummynet network testing!

VMware IO Analyzer – lab Flings

Flings in VMware labs is a great place for (very) useful tools or applications. This time I want to blog about a fling I often use in a test phase for implementation projects or in health assessments, see what synthetic load an environment can handle and if your vSphere design is up to the right io charactics and capacity.

Important in these kinds of test is your test methodology and plan: Assess, Filter test, plan, collect, analyse and report. With several of these steps IO Analyzer can be the player.

IO analyzer can configure, schedule and run several IOmeter workloads or replay vSCSI traces.

Download at:

Import ovf to your environment. Start with more then one, so you have some dedicated workers thoughout your environment.

After deployment change the second vmdk for the defaulted “small” configuration to approx 4GB plus (and Thick Eager). Why? Because the small amount of disk is used as disk test and fits in most storage cache. We need to get out of that and hit some real scenario’s.

One (or yes two) more things, logon to the consoles of all the appliances. Open a console, choose first option or press enter and login with root and password vmware. *ssst a very secret vmware user*. An other usage of the console is checking or monitoring the IOmeter tests i the console when they are running.


Type down one of the ip’s or hostnames of the appliances, and will use that one as the controller.

Open a browser (chrome or firefox) and type http:// and you will reach IO analyzer in your environment.

There we have the following options


For this I will use the workload configuration to add two tests to two appliances and check the results. Test scheduler is not used, will run immediately.

In this screen we first add the hosts where our test machines are, use the root password to connect to the hosts. When a connection is established the VM’s on that host are visible in the Add Workload Entry. Here you can find all kinds of IOmeter tests.


I have created two workloads, one Exchange 2007 on our first appliance and SQL 64K blocks on the other appliance. The duration is changed from the default 120 seconds to a 5 minute (300 seconds) schedule. This configuration is saved as Demo config.

Click on Run Now to let the test run. After a initialization you can see the progress in the console of one of appliances.


After completion of the tests you can view the test results in View Test Results (so it is not just a clever name :)).

Here you can check the two tests and the different VM and host metrics saved from IOmeter and esxtop (if there are any, you will get as a bonus the metrics of other VM’s on the hosts). More detailed information about these metrics, see the following URL: Duncan Epping also has a good article about esxtop metrics and more. Go see his site when waiting for the test to finish:

Here see the results of our Demo tests (I’m not going over in detail in this post).


Enjoy stress testing.