Blog

Subscribe to Email Updates

How Igneous Solves the Problem of Data Movement for Large Datasets

by Jeff Hughes – December 5, 2017

Getting data from where it is to where it needs to be sounds simple in concept, but becomes a big issue when your datasets are very large. Though the aspect that most often comes to mind is moving across geographies, different formats and impact to primary systems play equally challenging roles. Yet moving data well is a key function required for backup, archive, and cloud tiering.

Screen Shot 2017-12-05 at 11.24.55 AM.png

Why is Data Movement for Large Datasets So Hard?

Many ways to move data were designed when enterprise data was measured in gigabytes.  Now that it’s measured in petabytes, many old techniques don’t work anymore.

For example, one way to move data off legacy file systems was NDMP, which was a single-threaded protocol designed to move data linearly to tape. Those constraints don’t apply today, but the protocols are still often in use.

How Does Igneous Solve this Problem?

Igneous moves data from primary storage in highly parallel streams. Rather than using legacy protocols like NDMP, we come in via front-end protocols such as NFS and SMB, and open many parallel streams the way that many users would. In addition, the way we scan and the way we move data are done intelligently, specifically designing on how the filers are built.

Impact on Filers

Igneous is latency aware. We move data as fast we can when the filers quiesce, and as we detect load from users or applications, we back off intelligently.

This enables backups to run continuously without creating “backup windows” where backup administrators tell users and application owners the data is unavailable, from say 11pm-4am. In our case, backups run all the time.

Read Consistency

When read consistency is an issue, we have integration with APIs for the filers to take a snapshot, move data, and release the snapshot after we’re done. We’ve integrated with NetApp, Dell EMC Isilon, and Pure FlashBlade to date.

Moving Data to Other Locations or Cloud

The key element here is to understand where you need low latency. Between the filers and the data movement software you want a low latency connection, as POSIX semantics involved in NFS and SMB transactions require it.

However, the communications between our data mover software and our storage layers are RESTful protocols, designed to work over WAN and Internet connections just like the Web. In fact, the RESTful protocols all work over https.

As such, we can do data movement between Igneous systems or between Igneous systems and public clouds very efficiently and reliably, without the typical retries and timeouts associated with trying to run POSIX semantics over the network.

Learn more about our data movement engine on our newly launched Technology page.

Related Content

Moving to Cloud? Here's How a Hybrid Approach Can Help

May 8, 2018

Today, the growth of unstructured data is scaling beyond what traditional legacy storage and backup softwares were designed for. While advances in cloud architecture from AWS, Microsoft Azure, and Google Cloud present solutions for managing enterprise data, many enterprises face obstacles to adopting cloud and stick with legacy storage and backup software as a result.

read more

Is Switching Backup Vendors Really Such a Hassle?

May 1, 2018

We’re a vendor. We get it. Even if a modern enterprise customer is a little bit miserable, trying to keep up with their backup SLAs, comply with industry regulations, and keep data available to end users, all while watching their budget spiral out of control, it’s a total hassle to go out and evaluate a bunch of backup vendors. Then, once you decide to move forward with a “more modern” option, you expect to go through the motions of managing a major project.

read more

Archive First, Backup Should Be Boring, and Other Insights: A Conversation with Life Sciences Technologist Chris Dwan

April 24, 2018

Chris Dwan is a leading consultant and technologist specializing in scientific computing and data architecture for the life sciences. Previously, he directed the research computing team at the Broad Institute, and was the first technologist at the New York Genome Center.

Chris joined us for a conversation on data management challenges and trends in the life sciences. Read on to discover Chris’ insights from over a decade of experience in life sciences IT!

 

read more

Comments