Decoupling Storage Capacity & Performance

Written by Lynn Orlando

Written by Lynn Orlando

Published on January 26, 2020

If buying a car were like buying a traditional storage solution, things would be weird. The reason is that size and performance would always come together. If you wanted one, you’d have to get the other, whether you liked it or not.

Say you were shopping for a cute hatchback. Every car you looked at would come with a little 4-cylinder engine. But what if you wanted more horsepower? “Great,” the dealer would say, “We also have V6 engines. But they only come in these minivans.” Don’t want a minivan? Too bad, that’s just how cars are made.

Sounds crazy, right? And yet, that’s still how a lot of storage solutions work. Compute and storage are linked tightly together in a single physical node with no way to scale one or the other independently. Want higher performance? Then you’re also paying for more capacity, whether you need it or not. The reverse is true, as well. 

Fortunately, a new generation of storage systems is breaking that mold. Organizations finally have the freedom to scale in a way that aligns with their actual growth, instead of their vendors’ business models.


Solving The Performance Paradox

Tightly coupling compute with storage isn’t really a nefarious plot. There are good reasons why storage historically came that way in Big Data and high-performance computing (HPC) environments—namely performance.

Back when people first started analyzing huge amounts of data stored in distributed file systems like Hadoop, network interconnects were fairly slow (sometimes as slow as 100Mbps). If you wanted to analyze a large data store and you had to move your data over to be computed from someplace else, it would take forever. By putting compute in the same physical host as the data, you could eliminate those delays and zip through your analysis much more quickly.

In solving that problem though, the tightly coupled compute/storage model created a new one: inflexible storage systems. The only way to scale up was to buy an entire new node, adding both compute and storage. But in the real world, it’s common only to need one or the other. It’s not a surprise that expanding your system almost always meant spending CapEx on extra resources you didn’t need.


The Decoupling Revolution

In recent years, this trend has finally begun to change. The public cloud ushered in a model where you can break up computing needs into more granular categories: I want this type of processor, with this amount of memory, with this much storage, for this many hours per month. In this way, the cloud companies made it possible to align computing budgets more closely with actual needs. At the same time, they drove businesses to demand the same flexibility from their traditional vendors.

Today, even in HPC environments where performance is the top priority, storage systems are starting to follow the same trend. It’s now possible to get high-performance storage arrays that can scale capacity and performance independently. So, if you want the storage equivalent of a Honda Civic with a racecar engine, you can get it. Or, if you’re happy with your little Civic engine but want to add capacity for another 6 (or 8, or 20) passengers, you can do that too.

For the first time in high-performance storage, you can expand as your requirements change and only pay for what you actually need. It doesn’t sound all that revolutionary does it? It’s more like common sense.



Related Post

Optimizing Data Locality for Efficient Data Management

Optimizing Data Locality for Efficient Data Management

Data locality optimizing algorithms work to reduce the time it takes to retrieve data from a network. This post looks at what role they can play in your enterprise data management. Data locality is a basic computing principle that
Post-Production Data Pipeline

Post-Production Data Pipeline

An effective post-production data pipeline is the best way to accelerate a project, protect the data, and maintain top quality. Generally speaking, a data pipeline facilitates the automated flow of data from one point to another. Generally speaking, a data pipeline
What is Software-Defined Storage?

What is Software-Defined Storage?

Software-defined storage, or SDS, is a type of storage architecture that separates storage software from the storage hardware. It manages and unites all storage-area network (SAN) and network-attached storage (NAS) devices in your data center.

Media & Entertainment

The Stellus Data Platform (SDP) sets a new standard for media storage performance, empowering M&E companies to support more workloads, more simultaneous playback streams, and faster render times. Unlike architectures that waste resources on tasks irrelevant to modern storage, the SDP is an entirely new file system, built from the ground up for unstructured data and solid-state media.

Read Solution

Life Science

Stellus is rewriting the rules for life sciences computing. The Stellus Data Platform (SDP) file system is built from the ground up for storing and processing unstructured data. It enables organizations to dramatically accelerate application performance for genomic analysis. Researchers can now process more workloads in far less time and take concrete steps to enable personalized medicine therapies.

Read Solution

Stellus Data Platform

Stellus provides a modern, highly performant file storage platform that enables hyperscale throughput for digital enterprises. Based on key-value store technology and exceptionally suited for unstructured data.

Learn More

Solution Brief- Genomics

Unlock the Targeted Therapies of the Future

Read more

Solution Brief- M&E

Transform Media Economics with Breakthrough Capacity & Performance

Read more

Solution Brief- Cryo-EM

Break the Backlog for High-Speed Microscopy

Read more