Plumbing the Depths of the Data Management Iceberg

Exploiting the exponential growth of data to generate business value depends on how well companies can adapt their strategies to take advantage of evolving storage technologies and new data management tools. The biggest challenge lies below the surface in secondary storage systems.

Between the dawn of civilization and 2003, humans created about five exabytes of data. Today, we’re creating that amount every two days, according to Google’s chief economist. By 2020, that figure is forecast to increase to about 53 zettabytes (53 trillion gigabytes). The vast majority will be unstructured data, such as e-mails, documents, social media posts, videos, photos, audio files, and web pages. The rapid growth of the internet of things applications and the rollout of 5G will be major contributors to this unprecedented accelerating growth.

Businesses understand the inherent value of this data and are utilizing tools such as artificial intelligence (AI) and machine learning (ML) to leverage it to improve productivity and generate revenue from new digital products and services. They are also keenly aware of data’s ephemeral shelf life, especially unstructured data such as video and audio data.

A critical challenge companies face today is figuring out how to maximize the data’s value before it becomes worthless. This requires adopting robust data storage solutions for managing the explosion of different types of data to ensure it is secure and accessible when needed.

Increasingly, storage strategies that determine how data is handled and where it is stored are being determined not by the type of storage technologies, but by t—that is, the amount of processing required at a given time. Workload a, the

At the same time, storage technology is moving at breakneck speed to keep pace with the explosive growth and new workload requirements across different storage media. Magnetic storage is improving storage densities. Each new generation of flash storage technology leapfrogs the previous generation in terms of both density and interface speed. Until recently, system memory was much faster than storage devices, but the gap between DRAM and storage is closing rapidly.

Still, faster (lower latency) storage is expensive and needs more power, new interfaces, and new software. These trade-offs impact workload data models, tiering, and data provisioning.

In the past, large storage capacities were typically associated with low latency. However, flash storage has changed the calculus as the fastest three flash options—L1, L2 cache, and DRAM—are now also the most cost-effective and are considered the primary storage source.

This paradigm shift in storage technologies has given rise to a new structure of primary, secondary, and archival storage with no strict hierarchy for data migration and de-migration. Indeed, data across the entire hierarchy could be part of active-application workload execution without the need for staging or migration.

Primary storage represents only the tip of the data storage iceberg. About 80% of enterprise data resides in secondary storage, which is High-latency media that will continue to dominate the long-term tertiary archival storage sector.

There are a number of challenges ahead for optimizing the secondary-storage solution. One is that secondary data is often treated as dark data because of its sheer size; usually on the order of petabytes. Since data proliferation is not controlled by an administrator, there is very little information that can be gleaned about usage patterns, relevance, or application-specific metadata, which makes secondary storage management very difficult.

Because the vast majority of enterprise data resides in secondary storage, analytics and insight will become ever more important. Secondary storage management utilizing metadata is becoming a key differentiating factor. With the integration of primary and secondary storage in the workload execution process, real-time management of secondary storage for faster access and capacity provisioning will become the standard for data centers.

It is imperative that companies upgrade their storage management capabilities—in particular, secondary-storage management—to keep pace with the explosive growth in data and new workload requirements. Storage management is critical for enabling high-quality data analytics and insight, which will become an even more critical competitive differentiator in the future.

With over two decades of experience and broad expertise in storage technology and data management, Capgemini Engineering is well-positioned to help companies develop their storage management strategies. Our Converged Systems R&D group is currently developing a reference architecture for secondary storage with an emphasis on providing insight into archived data with infrastructure and application-specific metadata.

Insight is the currency that will power the zettabyte era.

By DineshKumar Bhaskaran, Atul Kulkarni, Anantha Prakash T

Capgemini Engineering Insights
Contact us


Contact a data management strategy expert today.