The new logic of computing: the distributed data cloud

To further strengthen our commitment to providing industry-leading data technology coverage, VentureBeat is pleased to welcome Andrew Brust and Tony Baer as regular contributors. Keep an eye out for your articles on the data pipeline.

A common pattern in analytics ecosystems today is that data produced in different areas of the business is sent to a central location. Data flows to data lakes and it’s cordoned off data stores, managed by IT staff. The original producers of the data, often subject matter experts within the business domain, effectively lose control or have layers of data significant to their work removed. This separation decreases the value of the data over time, and the data is diverted from business consumers. Imagine a new model that flips this ecosystem on its head by breaking down barriers and applying common standards everywhere.

Consider a analysis stack that could be implemented within a commercial domain; it stays there, owned by team members in that business domain, but centrally operated and supported by IT. What if all the data products generated there were managed entirely within that domain? What if other business teams could just subscribe to those data products or get API access to them? Much attention has recently been paid to an organizational pattern (data grid) that promotes this decentralization of ownership of data products. However, which ecosystem architectures are well-suited to provide the technical backbone to enable a data mesh and can deal with emerging patterns of data growth?

As data volumes grow, the idea of ​​moving data to a centralized location for processing becomes more expensive and time-consuming, especially if that data is generated outside of a traditional data center or public cloud. Instead, companies will increasingly prefer to implement analytics processing at the places where the data is generated. The ability to easily geolocate data for latency, compliance or security reasons will transform the way we compute into a more sustainable, efficient and logical reality: that is the territory of the distributed data cloud. Controlling data seamlessly anywhere is how businesses take advantage of the incredible data growth ahead.

The distributed data cloud it’s not a single tool or platform, but an ecosystem pattern that gets data to the right place and the right person at the right time in a secure, governed and trusted way. It includes a federated collection of data management and analytics services spanning public clouds, private clouds, and the edge.

Managed from a single control plane, a distributed data cloud enables analytics applications to be provisioned at the point of need on a right-sized mix of physical and virtualized infrastructure, based on data gravity, data sovereignty, governance requirements. data and latency.

Several important trends will drive companies to unlock the full value of their data with this model, where the infrastructure works to democratize data, not imprison it.

Edge Computing Puts Pressure on Internet Capacity

is trustworthy foretold that by 2025, 75% of enterprise-generated data will be created and processed outside of the traditional centralized data center or cloud, up from less than 10% in 2019. The explosion of data and devices at the edge and deployment 5G and planning for 6G (100 Gbps networks for the next 10 years) has accelerated the realization that the Internet backbone does not have enough capacity to transport all data activity at the edge to centralized data centers for distribution. analysis.

Distributed Cloud Addresses Hybrid Disadvantages

The Gartner Top Strategic Technology Trends Report 2021 suggests that the distributed cloud, the infrastructure required as a precursor to the service of a distributed data cloud platform implementation discussed in this article, is emerging to address location-affected latency. The deployment of cloud software and hardware stacks outside of a public cloud provider’s data center to provide a mesh of interconnected cloud resources is what is meant by distributed cloud. Its stacks enable enterprises to run applications built for the public cloud in a company’s own data center and other locations, such as multi-access edge computing centers connected to 5G cell tower groups, or on the factory floor as support of IoT applications in manufacturing. But businesses still benefit from the public cloud’s value proposition and guaranteed SLAs.

Both hybrid cloud and hybrid IT break the fundamental value propositions of cloud. That is, the hybrid is very difficult to execute efficiently, taking full advantage of the scale and elasticity of the services offered by the public cloud. Hybrid does not produce efficiencies in cloud operations, governance, and upgrades that public cloud offers, nor do these systems keep pace with public cloud innovation. Distributed cloud means the same seamless cloud experience everywhere.

Mobile enterprise hyper-personalization and multi-experience

Ultimately, companies want to put interactive and predictive analytics in the hands of the real consumer. To that end, instead of data stores serving a community of thousands of users, data stores will ultimately serve a user community of millions of end consumers. The current ubiquity of mobile device usage gives an idea of ​​where multi-sensory, multi-device, multi-touch business experiences with data are headed. The computer is rapidly becoming the environment that surrounds the user.

An increasingly API-driven culture everywhere, seamless UX/UI, and democratized data access across businesses will drive the shift toward real-time, hyper-personalized interactions between people, places, and things.

Among the first use cases

With these trends driving the advent of the distributed data cloud, several use cases are on the immediate horizon.

First, there is a widespread need to simplify hybrid and multicloud operations that have a consistent environment across the public cloud, on premises, and at the edge. A compelling reason for this, particularly in regulated industries like banking, is to help reduce the risk of cloud concentration by distributing data and analytics across more than one cloud provider or data center. To accomplish this using a distributed data cloud, an enterprise can provision data management and analytics applications in containers and run them anywhere Kubernetes is deployed: in a public cloud, on-premises, or at the edge. It all happens through the same UX and management devops processes and from the same web console and API.

Second, the processing of personally identifiable information (PII) in a country of residence is a scenario where localized access and regulatory compliance make moving compute to data the best solution. Running a cloud-optimized instance of distributed data across individual hospitals in a public cloud stack located next to the hospital allows patient data to remain at the source.

A third use case where the need is already skyrocketing involves IoT analytics. The ability to perform secure analytics at the edge of the network and close to consumers through a distributed data cloud means real-time answers for connected cars, smart cities, energy grids and more. Running optimized analytics on AWS Wavelength, for example, in a multi-access edge environment to monitor network quality in real time will be entirely feasible.

Bringing to life a distributed data cloud, where data anywhere is easily managed and put to work, is not a one-vendor game and probably never will be. Rather, a consortium of companies rallying around this idea and working in symbiosis will bring the party to data and success to companies ready to grasp a more logical future.

Mark Cusack is the CTO at Yellowbrick

Data decision makers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including data techies, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.

you might even consider contributing an article yours!

Read more about DataDecisionMakers

Leave a Reply

Your email address will not be published.