Disaggregation for Less Aggravation

NVMe SSDs ushered in an era of standards-based access to Solid State storage.  Previous protocols like SCSI and SAS were designed for spinning disk media and carried severe overhead that squelched the performance that can be delivered by NAND.  It’s no coincidence that IDC forecasts NVMe will make up more than 50% of all enterprise SSD shipments by 2020[1].

One step forward, two steps back

While significant increases in IOPS and throughput can be delivered with today’s NVMe SSDs.  The deployment model is arcane.  By implementing NVMe inside of the server, we are back to the problems of direct-attached storage that spawned the creation of the Storage Area Network (SAN) in the late 1990s.

Server-based SSDs require capacity dedicated to a specific server, leaving excess capacity stranded.  As you scale out numerous SSDs and servers, management becomes a significant hassle.   As performance needs increase (millions of IOPs for a small amount of capacity) the only logical approach is to deploy lots of servers and NVMe drives, making ROI difficult to justify.

While All-Flash Arrays breathe new life into today’s SANs, their design is fundamentally flawed when it comes to harnessing the power of NVMe.  Most All-Flash Arrays (AFAs) simply replaced the media in a traditional HDD-based architecture with SSDs and added a modest amount of system tuning.  These systems are limited by the legacy storage controllers designed for SCSI-based protocols, and thus will not be able to get the most out of today’s low latency NVMe-based flash storage.   Evidence of this is that even ‘NVMe-Capable’ AFAs, where NVMe SSDs have replaced SATA or SAS SSDs in the enclosure, most are delivering 1 or 2 million IOPS and maybe 10 GB/s bandwidth, when the media and the available storage interconnects and protocols can deliver orders of magnitude more performance.

Disaggregation!

Leveraging standard NVMe-Over-Fabrics technology using InfiniBand, TPC or Ethernet, Pavilion releases NVMe storage from servers, allowing it to be allocated on-demand to performance applications.  Through disaggregation, administrators can dramatically improve operational efficiency.  Unlike direct-attached server-side SSDs that are assigned to a set of server CPU cores whether storage capacity is needed or not, the Pavilion array can assign the exact amount of NVMe storage and bandwidth needed per server to maximize the number of parallel operations while reducing wasted capacity that comes with dedicated NVMe SSDs.

Perhaps even more powerful, is the ability to fine-tune the Pavilion system for bandwidth, IOPs and storage capacity.  If 40 GBps of throughput is required for your video streaming application, yet only 4TB of storage is needed, just change the system through an intuitive user interface.  If you are using “server-side” NVMe SSDs, you would need to install additional drives in each server, whether CPU horsepower is required or not.  Try finding any AFA that can work at such an efficient granularity.

Less Aggravation

In a dynamic public or private cloud, workloads are constantly changing.  During ETL, high bandwidth is needed and minimal amounts of SSD capacity are required for ingest staging.  However, as we move to analytics, the boss expects the results instantly upon making the request.  This means millions of IOPs applied to small or large data warehouses or data lakes.  If your budget is infinite, just build multiple infrastructures for each type of workload or task.  But since we know it is not, the time is now for disaggregation and less aggravation.

[1] IDC Worldwide Solid State Drive Forecast 2018-2022, May 2018.