Multimedia data. Electronic health records. Cryo-electron microscope data. Machine images. Sensor data. Participants in the April 12 “Overcoming Backup Challenges” CrowdChat conversation say they deal with many different types of unstructured data.
What do these IT professionals and other participants share? An appreciation for how fast backup datasets are growing and the potential complexities of backing up files.
Host John Furrier of SiliconANGLE and @theCUBE led the CrowdChat on data backups. Explore the entire conversation on the CrowdChat website, and check out our blog posts on "Managing Billions of Files", "Spawning a New Hybrid Cloud", and "When Data Can’t Move Offsite" to read insights from previous CrowdChat conversations.
“I hear all the time on @theCUBE that backup is broken, especially as data is stored in a zillion places,” John said.
Why do people consider backup challenging at best and “broken” at worst? A few common threads ran through the CrowdChat conversation.
Backup Struggle #1 - Data growth
Backup sets keep growing because organizations’ data keeps growing. Chris Dwan, a senior technologist in the life sciences, told the CrowdChat group that he works with one customer that doubles its primary data every 12 months.
“I was in a session yesterday where the speaker referred to the ‘coffee break moment’ when you type ‘ls’ and go get a coffee,” Chris said.
Data is growing so fast that it’s causing some organizations to question whether to backup data at all, according to Jeff DiNisco, P1 Technologies’ Vice President of Solutions Architecture.
“Customers are finally asking different questions: ‘Does this really need to be backed up?’ ‘Can it be recreated?’ ‘What’s the real cost of losing this data?’ ” Jeff said.
For some organizations, the answer to that question about the cost of losing data is a lot, according to Igneous Systems Chief Marketing Officer Steve Pao.
“We’re seeing backups grow at the rate of data or faster as organizations are finding more value out of the ‘past’ data as machine learning takes off,” Steve said. “The data is even more worth protecting now.”
Backup Struggle #2 - Data bandwidth limitations
In the 1971 book “Computing: A Second Course,” author Fred Gruenberger wrote: “There’s a lot of bandwidth in a station wagon.” In Computer Networks magazine in the mid-80s, computer scientist Andrew Stuart “Andy” Tanenbaum tweaked this quote and said, “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.”
A few years ago, xkcd “What If?” explored the notion that it’s faster to ship a few hundred gigabytes of data via FedEx than via the Internet. This idea has been dubbed SneakerNet. During the CrowdChat, Rubrik Technical Marketing Manager Andrew Miller and CIO-Sherpa Managing Director Christopher Jones alluded to these findings.
“It’s crazy that FedEx is still sometimes the highest bandwidth option out there,” Andrew said.
“Even Amazon has gotten into the FedEx movement of data,” added Christopher.
Backup Struggle #3 - Traditional storage headaches
Traditional storage approaches to data backups – such as tape – result in large backup catalogs, complex architecture and high license costs, according to Jeff.
These catalog problems can mean segmenting data into backup silos and worrying about backing up your backup catalog.
“There’s some circularity there,” Steve said.
Chris Dwan agreed, sharing the the story of a customer who’s maintaining seven copies of the same information in three different formats.
But change is happening. P1 Technologies’ Jeff says he’s moving away from the traditional approach to backup. And Steve says there’s a trend of scale-out architecture because “backup infrastructure shouldn’t be something you must spend a lot of time architecting and managing.”
“It’s a huge market out there,” Rubrik’s Andrew said. “It does seem like recently innovation in this area has accelerated, which is cool to see.”
Backup Struggle #4 - Data retention policies
In some industries, regulations may dictate how long an organization retains its data. Chris Dwan, for instance, shared that in his line of work, the U.S. Food and Drug Administration mandates 7-year data retention for certain files while clinical rules dictate retention for “the life of the patient” for others.
“Coupled with exponential data growth, that means that we rarely delete anything. Ever,” Chris said.
“If you never delete anything, I’m surprised you have any IT budget left for anything other than backup,” said James Kobielus, SiliconANGLE Wikibon Lead Analyst for Data Science, Deep Learning, and Application Development.
Expedient customers follow different retention policies, though most run daily backups held for four weeks, monthly clones held for 12 months and annual clones held for three years, according to John White, a product strategy executive at Expedient.
“Very few test restores,” John said.
But data retention isn’t always spelled out. Too often, organizations don’t prepare service-level agreements that document how to restore data from backups, according to CrowdChat participants. Sometimes, that’s because IT can’t get signoff from the business. Instead of SLAs guiding them, they follow a philosophy of this-is-the-way-backups-have-always-been-done, according to Andrew.
“I do see most customers just doing daily backups because it’s what they know and what the business is used to,” he said. “In a perfect world, the focus starts with how often to backup, how long to keep it, when to archive, and when to replicate,” he said. “And you match tech to that.”
Even when organizations use SLAs, they may be written to focus on the technology in place rather than on what the business actually needs, according to Jeff.
Backup Struggle #5 - Challenges with segmenting your data backup domains
Host John Furrier asked how participants segment their backup domains. By technology? Department? Compliance category?
“I recommend segmenting by the intended downstream use,” Chris Dwan said. “Segmenting by department or technology seems to lead to fragmentation down the road. The compliance category is a great place to start because at least you have some specific rules to follow.”
Politics and budget control sometimes drive segmentation by department, according to Andrew. But without C-Suite oversight, that sort of departmental split “makes it very hard” to take advantage of economies of scale or enterprise features of a data storage product, Chris Dwan said.
One of Jeff’s large customers segments because their catalogs won’t scale to support their datasets. As a result, they create backup pods to manage them.
“Catalogs won, they lost,” said technologist Nick Kirsch. “The only one who benefits from manual sharding is the catalog vendor.”
Download the Igneous whitepaper “Secondary Storage for the Cloud Era” to learn more about the challenges of the growth in unstructured data and how to consolidate secondary storage in the cloud era. Visit our Igneous Backup page for details on our fast, easy backup and restore solution for unstructured data.
Check out our blog posts on "Managing Billions of Files", "Spawning a New Hybrid Cloud", and "When Data Can’t Move Offsite" to read insights from previous CrowdChat conversations. Hope you can join us for a future CrowdChat conversation!