Stellus Helps Drive Actionable Insight

in Life Sciences Research 

Written by Jaideep Joshi

Written by Jaideep Joshi

Published on April 7, 2020

Solving Problems

In the scientific community, problems in bioinformatics, computational biology, and structural biology are known to be very hard to solve. The core scientific difficulties are compounded by compute and data processing complexities, extremely long timelines, and accuracy concerns, not to mention high costs. However, solving these problems are critical in the identification and cure of diseases by developing new drugs and medical treatments.

Until recently, the widespread adoption of Genomic Sequencing & Analysis, or Structural Biology workflows in Cryo-EM, was often limited to very few organizations.

Enhancements in speed and accuracy, coupled with reduction in costs in front-end laboratory instruments, are quickly making these scientific endeavors mainstream in many research, pharmaceutical, and clinical organizations. Advancements in automation are reducing slowdowns in multiple rounds of human trial-and-error activities in these workflows.

Today, advancements in high-throughput sequencing have made it possible to sequence a whole human genome for under $600. Many organizations are now routinely sequencing hundreds of samples per day.

Innovations in high-resolution image capture in Cryo-EM have enabled researchers to clearly identify shapes of single cell proteins, thousands of times smaller than a human hair.

New techniques of spectrometry-based proteomics are increasingly being applied to biological and biomedical research.

All of the aforementioned scientific breakthroughs are being aided in innovative ways using widely available ML/AI and related data science activities to uncover unique actionable insight previously unavailable.

The Changing Landscape

While the digital creation and capture of raw data has indeed become fast and economical, with many environments generating hundreds of terabytes (TB) to multiple petabytes (PB) of data per day, the efforts to quickly derive actionable insight from this data are pushing the boundaries of existing IT environments.

Traditional HPC is changing. Much of the computation is now shifting from CPU-centric tools to GPUs, FPGAs, and ASICs. The versatility, simplicity, and cost-effectiveness of Ethernet has made it possible to deploy 25, 40, and 100 Gigabit Ethernet as alternatives to InfiniBand in many HPC environments. Ethernet is quickly becoming the de facto choice to deliver throughput and low latency required by instruments and compute clusters to store and access large amounts of scientific data generated on a daily basis. Software-based parallelism with modern frameworks like Spark are also making it possible to efficiently solve these large data-intensive problems in new ways.

The end result of the changes in the compute and network areas is that workloads like Genomic Analysis and Cryo-EM Modeling are shifting from (previously) being compute bound to (now) being I/O bound.

The Data Access Problem

As an example, a single new camera working in conjunction with a Cryo-EM microscope is capable of producing 5PB of data per day. This kind of equipment needs an extremely reliable high-speed data platform to sustain these high data ingest rates. In the absence of such a platform, researchers are constantly faced with massive data storage bottlenecks. Data migrations and movements result in lengthy (and expensive) microscope downtimes and wasted research cycles, all before one can even begin the data analysis phase. This is further compounded when one understands that research activities are iterative in nature. It is very normal (and imperative) for researchers to repeat their experiments and analysis multiple times.

Storage systems built in the era of HDDs have only been capable of delivering an incremental improvement in performance with inclusion of SSDs, mostly in faster read activities. The legacy block-based storage architectures just cannot handle the high sustained writes that are necessary.

The introduction of new innovative memory and NAND devices and components have made it evident that these decades-old file systems and software stacks cannot truly exploit the capabilities of next-generation data storage media.

Considering that most of these datasets are created in research facilities on premises, most scientists do not have cloud-based options today. In rare cases where data could be transferred to the cloud, many cloud-based alternatives can be 5x to 9x as expensive as on-premise environments, mostly due to high throughput and data movement costs.

The Solution

We at Stellus Technologies have taken up this data storage challenge. First-hand knowledge of the latest memory and flash devices and deep understanding of data storage and access requirements have resulted in the creation of the Stellus Data Platform (SDP). SDP is unique in its use of Key-Value over Fabric (KVoF) techniques, coupled with NVMe and RDMA to deliver unmatched sustained read-write performance at scale.

SDP uses strongly consistent, scalable, and reliable Key-Value Stores as the underlying mechanism to ingest and access exabyte-scale unstructured data. By eliminating age-old and compute-intensive data maps, data look-ups, and cache-coherence tasks, the resulting architecture can consistently deliver 4x-5x low-latency throughput (GB/s) in very small footprints compared to other industry counterparts. As an example, SDP can deliver 40GB/s of sustained read-writes in 5U.

The aforementioned performance can be increased predictably and independently from the underlying storage capacity. The disaggregated architecture of SDP allows you to scale the platform for throughput (GB/s) or capacity (TB) as your requirements change, without artificial limitations. Designed to run on industry-established x86 architecture, SDP is truly software defined, and no bespoke hardware nor custom client code is needed to avail these performance and access capabilities. The POSIX-compliant interface ensures that all familiar platform services are easy to use.

 The Outcome

The scientific community is well-versed in delivering and, in turn, expecting tangible results. SDP has on several occasions quickly and measurably delivered on its core-value proposition: High-performance systems save precious time. Among others, it is noteworthy how SDP has delivered a solution to a leading scientific organization for its microscopy needs. Deployment of SDP in the data analysis pipeline has successfully eliminated multiple days of instrument downtime and weeks-worth of delays in lab activities. Please visit here to learn more.

Subsequent posts will dive into the details about how the Stellus Data Platform provides value in areas of Genome Analysis, Cryo-EM, and ML/AI workflows.

Related Post

What Makes HDR Video So Special?

What Makes HDR Video So Special?

What Makes HDR Video So Special? Most people can appreciate the art of beautifully implemented cinematography, yet one might argue that those of us who are fans of classic cinema and television are especially attuned to the miracle that high dynamic range (HDR).
Why Will NVMe and NVMeOF Dominate the Land?

Why Will NVMe and NVMeOF Dominate the Land?

Side Note: There are four V’s in data: volume, variety, veracity, and velocity. Well, really five if you add value, but that is another conversation. For this conversation we will focus on just one: velocity. Velocity is the frequency of incoming
Hollywood Has a Velocity Problem

Hollywood Has a Velocity Problem

What do Amazon Web Services and Facebook have in common? Of course, they are both fantastically successful, but they also share something else. Technologically, they are both back-ended by Key-Value Stores. So what is a Key-Value Store (KVS)?successful, but they also
Optimizing Data Locality for Efficient Data Management

Optimizing Data Locality for Efficient Data Management

Data locality optimizing algorithms work to reduce the time it takes to retrieve data from a network. This post looks at what role they can play in your enterprise data management. Data locality is a basic computing principle that

Cognitive AI

Artificial intelligence management requires massive data sets and high-speed processing to achieve the degree of efficiency and accuracy necessary to train neural networks and establish actionable insights. Through innovative software and services, Stellus Data Platform empowers and inspires customers around the world to transform data into intelligence.

Read Solution

Media & Entertainment

The Stellus Data Platform (SDP) sets a new standard for media storage performance, empowering M&E companies to support more workloads, more simultaneous playback streams, and faster render times. Unlike architectures that waste resources on tasks irrelevant to modern storage, the SDP is an entirely new file system, built from the ground up for unstructured data and solid-state media.

Read Solution

Life Science

Stellus is rewriting the rules for life sciences computing. The Stellus Data Platform (SDP) file system is built from the ground up for storing and processing unstructured data. It enables organizations to dramatically accelerate application performance for genomic analysis. Researchers can now process more workloads in far less time and take concrete steps to enable personalized medicine therapies.

Read Solution

Stellus Data Platform

Stellus provides a modern, highly performant file storage platform that enables hyperscale throughput for digital enterprises. Based on key-value store technology and exceptionally suited for unstructured data.

Learn More

Solution Brief- Genomics

Unlock the Targeted Therapies of the Future

Read more

Solution Brief- M&E

Transform Media Economics with Breakthrough Capacity & Performance

Read more

Solution Brief- Cryo-EM

Break the Backlog for High-Speed Microscopy

Read more