Cloud-Managed, On-Premises Data With Zero-Touch Infrastructure™

by Christian Smith – October 10, 2016

We coined the term Zero-Touch Infrastructure™ in the earliest days of the company. Looking back at my notes, the earliest mention of this term was in an internal blog I wrote on December 4, 2013, less than 40 days from when we got started. Our vision was that, even with infrastructure in their own datacenters, our customers should not have to “rack and stack” it, power it, or service it. Basically, we wanted them to never have to touch it—that is, we wanted to deliver a Zero-Touch Infrastructure. (We were enamored enough by the term that we trademarked it!)

This desire was driven by three trends that we saw:

  1. Enterprise data sets continue to grow explosively, as does the infrastructure needed to ingest, process, and store this ever more valuable data.
  2. At these scales (hundreds of terabytes to hundreds of petabytes of data, with hundreds to thousands of servers, and hundreds to thousands of network ports) “something” is happening on a daily, even hourly, basis. This “something” could be normal maintenance activity, like monitoring for anomalies, updating software, applying security patches, or handling the expected failures of components like hard drives, memory modules, fans, etc.
  3. Traditional infrastructures in enterprise datacenters are treated like pets, not cattle, and managed by humans. Given the growing scale of infrastructure to manage data-centric workloads, the normal operational burdens quickly become untenable and a detriment to scalability, flexibility, and agility.
Unlike traditional infrastructure and management stacks from legacy vendors, systems such as FBAR (hbr article) used at Facebook, and BorgMon used at Google, allow for the automated monitoring, management, and remediation of hyperscale infrastructure using software and machine intelligence.

We set out to build for data-centric workloads that could not or would not be moved to a public cloud. As such, the data had to stay within the customer’s network perimeter. Yet we also wanted to deliver a Zero-Touch Infrastructure to them.

The Igneous architecture has two key components that enable us to deliver on the promise of True Cloud for Local Data:
  1. Provisioning, monitoring, and remediation software (i.e., what we call the control plane) that runs in the Igneous cloud.
  2. On-premises appliances that are, from the ground up, built with the intention of being truly fail-in-place.

Our CTO, Jeff Hughes, will expand on our unique approach to on-premises appliance architecture in a future blog article. For now, let’s dig a bit deeper into the control plane in our cloud.

We have abstracted and separated the data plane from the control plane (Our CEO, Kiran Bhageshpur, previously wrote about Why Software Defined Storage Matters and how this separation is at the heart of Software-Defined Storage/Networking/Datacenter).

In summary:
  1. The scalable software tier that runs in the Igneous Cloud does not have access to customer data, which remains secure within the customer’s own network perimeter. 
  2. This tier is responsible for:
    1. The physical provisioning of on-premises appliances
    2. Continuously receiving health and status information from on-premises appliances. (Note this outbound reporting of status by the appliances is key in that we do not reach into a customer’s network, but instead passively receive telemetry that is securely transmitted.)
    3. Sending alerts and automatically migrating services in the face of component failures
    4. Maintaining software and security updates to on-premises appliances in a non-disruptive manner
  3. Additionally, this layer acts as the single pane of glass through which our customers can logically manage multiple systems in multiple locations.
There you have it! The Igneous Cloud is merely the reimagining and repackaging of the principles behind Facebook’s FBAR and Google’s BorgMon for our infrastructure in tens of thousands of customer locations (as opposed to merely hundreds of hyperscale datacenters). In effect, we run an “NOC in the cloud,” except it is powered by software with humans being involved only in the exception case.