How Igneous Solves the Problem of Data Movement for Large Datasets

by Jeff Hughes – December 5, 2017

Getting data from where it is to where it needs to be sounds simple in concept, but becomes a big issue when your datasets are very large. Though the aspect that most often comes to mind is moving across geographies, different formats and impact to primary systems play equally challenging roles. Yet moving data well is a key function required for backup, archive, and cloud tiering.

Screen Shot 2017-12-05 at 11.24.55 AM.png

Why is Data Movement for Large Datasets So Hard?

Many ways to move data were designed when enterprise data was measured in gigabytes.  Now that it’s measured in petabytes, many old techniques don’t work anymore.

For example, one way to move data off legacy file systems was NDMP, which was a single-threaded protocol designed to move data linearly to tape. Those constraints don’t apply today, but the protocols are still often in use.

How Does Igneous Solve this Problem?

Igneous moves data from primary storage in highly parallel streams. Rather than using legacy protocols like NDMP, we come in via front-end protocols such as NFS and SMB, and open many parallel streams the way that many users would. In addition, the way we scan and the way we move data are done intelligently, specifically designing on how the filers are built.

Impact on Filers

Igneous is latency aware. We move data as fast we can when the filers quiesce, and as we detect load from users or applications, we back off intelligently.

This enables backups to run continuously without creating “backup windows” where backup administrators tell users and application owners the data is unavailable, from say 11pm-4am. In our case, backups run all the time.

Read Consistency

When read consistency is an issue, we have integration with APIs for the filers to take a snapshot, move data, and release the snapshot after we’re done. We’ve integrated with NetApp, Dell EMC Isilon, and Pure FlashBlade to date.

Moving Data to Other Locations or Cloud

The key element here is to understand where you need low latency. Between the filers and the data movement software you want a low latency connection, as POSIX semantics involved in NFS and SMB transactions require it.

However, the communications between our data mover software and our storage layers are RESTful protocols, designed to work over WAN and Internet connections just like the Web. In fact, the RESTful protocols all work over https.

As such, we can do data movement between Igneous systems or between Igneous systems and public clouds very efficiently and reliably, without the typical retries and timeouts associated with trying to run POSIX semantics over the network.

Learn more about our data movement engine on our newly launched Technology page.