From Cave Walls to Billions of Files: The History of Data

by Tammy Batey – March 30, 2017

Remember that old philosophical question: If a tree falls in a forest and no one is around to hear it, does it make a sound? A modern version of that question might be: If data exists but no one is able to find it again or make sense of it, does it have value?

In the blog post “3 Challenges of Storing Billions of Files Beyond Big Data,” I explored the amount of data that enterprises generate and I shared insights from our CrowdChat conversation about the challenges all that data creates for organizations.

With this post, I’ll look at how we got to where we are now. Check out our How Did We Get Here? Infographic for an exploration of the history of information storage. What are key milestones in the history of storing data? How has information storage evolved to handle all that data? And how has unstructured data, machine learning and artificial intelligence shaken up everything? Read on for details.

 

Big milestones in the history of data

As Igneous illustrates in our How Did We Get Here? infographic, the history of data is also the history of the many ways people have recorded that data. In 160,000 BC, when the first Homo Sapiens appeared, the brain served that purpose, acting as the “first storage medium.”

Around 30,000 BC, long before the emergence of what we recognize as written language, people drew in caves. While we may never know what motivated them and researchers continue to speculate about the purpose of cave drawings, ancient people captured information via one of the only means available to them.

The history of data includes the Hollerith Tabulating Machine, which recorded census data on punch cards for the 1890 U.S. Census.The medium for information storage has frequently changed, from information written in Sumerian on clay tablets in 3350 BC that scholars are still translating to the Hollerith Tabulating Machine, which recorded census data on punch cards, enabling the agency to access 1890 U.S. Census results before population growth rendered them obsolete.

Over the years, information storage continued to evolve to handle the massive growth in data Lawton Constitution newspaper first coined the term “information explosion” in 1941. But this was all structured data, organized data in database fields. Structured data consumes small amounts of storage, by today’s standards.

The need to make sense of data

Igneous Chief Technology Officer Jeff Hughes says today’s wealth of data creates new challenges for the enterprises that generate it.

“We're constantly in this rich environment of this data being produced,” said Jeff, who co-founded Igneous. “How do we know what's in it and utilize it? How do we enable the next level of true insights into somebody's data?”

The Census Bureau’s struggles in the late 1880s to capture and analyze data almost seems quaint these days. A single organization struggles to capture the information contained in billions of files.

All this data represents an exciting opportunity for tech companies like Igneous to bring value to people’s data, according to Jeff.

“We enable our customers to do things they weren’t capable of doing because they were managing infrastructure,” he said. “Everyone feels that pressure of: I need to do something with the data and not manage the infrastructure. There’s never an end to work. People want to know: How do I do more?”

The emerging role of data science

One way that enterprises use their data is to market more effectively to customers. In “A Very Short History of Data Science,” Forbes magazine walks through a few highlights of data science. The article references a September 1994 Business Week cover story on database marketing.

“Companies are collecting mountains of information about you, crunching it to predict how likely you are to buy a product, and using that knowledge to craft a marketing message precisely calibrated to do so,” according to the Business Week article.

The applications are much more broad, though, than gaining intelligence about customers and prospects so you can better tailor your marketing messaging to them. Companies also use data to improve their customers’ user experience, and gain insights on inefficient processes, cost-saving opportunities and employee uses of internal tools.

The Forbes magazine article includes this nugget from a February 2010 article in The Economist Special Report Data, Data Everywhere.” Author Kenneth Cukier writes “a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data.”

 

The challenge of unstructured data

Unstructured data takes “information explosion” – that term from 1941 – to a whole new level. A Word document doesn’t go into a database. Neither does a photo or a video or variety of other data people and enterprises create.

It’s more complicated to store unstructured data because of the sheer size and because it doesn’t feed neatly into database fields that basic code can detect and analyze. We used to measure data in kilobytes and megabytes. Now, we measure data in terabytes and petabytes.

“Every company is a tech company now. There’s that innovate or die aspect. If you're not leveraging your data, someone will leverage that data and eat your lunch,” Jeff says. “Every CEO of every enterprise is asking, ‘How am I leveraging my data to not become obsolete.’ ”

The Port of Hamburg uses a combination of telematics data, map visualizations, traffic information and hub sensor data to gain a more accurate picture of traffic. As a result, truck drivers no longer spend 70 percent of their time waiting to pick up shipments, and the port doubled its container handling capacity without increasing its shipment space, according to D!gitalist magazine’s “Two Reasons You Should Expand Operations Like Europe’s Second-Largest Port.”

“The 99% of business data you aren’t using is a goldmine of optimization potential,” writes author Shelley Dutton.

 

AI and machine learning

Throw machine learning and artificial intelligence (AI) into the mix, and it’s easier for companies to capture and gain insights from data than ever before. Combine the correct algorithms with the right amount of data and computers can process mountains of data faster than humans – and even predict people’s behavior.

People sometimes confuse machine learning with AI. In “Machine learning versus AI: what's the difference?” in Wired magazine, Lee Bell defines AI as the science of programming computers to act with human intelligence.

A sub-field of data science, machine learning involves the computer methods to make AI possible. Or as Stanford University defines it, machine learning is “the science of getting computers to act without being explicitly programmed.” It involves designing algorithms that predict behavior based on data.

When you request a ride via mobile app Uber, for instance, machine learning behind the app will suggest shortcuts to your home, work and other frequently visited locations, according to Venture Beat’s “Uber is rolling out a big redesign powered by machine learning.” It “learns” from your ride history.

Using machine learning, social media site Pinterest recommends related content for users to “pin” based on content that they’ve already liked or pinned to their pages, according to Fast Company, which named Pinterest one of the world’s most innovative companies.


What does Igneous contribute to the history of data?

Using AI and machine learning to process large, unstructured data can pay off for enterprises. But it’s simply not optimal for those with legacy infrastructure, which wasn’t designed to support today’s data volumes. As a result, these organizations struggle with protecting data via backups and cost-effectively storing data – very expensive infrastructure often means tiering storage across high-performance versus high-capacity tiers.

Igneous CTO Jeff says he’s excited to help customers make sense of their data with our product family and the “opportunity to bring value to people's data.” Igneous offers storage, backup and archive services built to handle massive file systems.

For instance, File Insights – a product feature of Igneous Archive – analyzes files and recommends data that could be moved off primary storage. That information can save companies money by helping them free up their valuable primary storage.

Igneous Backup provides data analytics and insights into system performance and the types of data that end users most frequently update. These analytics help you decide what data is most dynamic – and perhaps most important – so you can make more informed decisions about what to backup.

“Ultimately, it’s about helping end users to be more productive,” Jeff said, “and teaching them something about the data they didn't know.”

 

Learn more

Want to learn more about data storage?