Collective Data Set and Equal Data Position between B2B Partners in Data Sharing Ecosystems - enter: Blockchain image thumb 11

Collective Data Set and Equal Data Position between B2B Partners in Data Sharing Ecosystems – enter: Blockchain

image

Many organizations collaborate in one or more ecosystems: groups of organizations that work together in vertical chain or share common interests sometimes even despite being competitors. Examples are research environments (health, climate, agriculture, environment), supply chains & logistics, insurance industry, government agencies, pension funds, traffic management. Note that within large, loosely organized companies an ecosystem could consist of various divisions, sites or departments).

Within the the ecosystem, some data is considered open – equally accessible and co-owned by the members. To non-members (the rest of the world) this data is not accessible. All ecosystem partners have an interest in the collective data set and all of them contribute to the data set.

The term Industrial Data Space as been coined and has generated some interesting activity (see for example https://www.internationaldataspaces.org for examples of and suggestions for implementing an industrial data space).

When trying to organize and realize a collective data set, there are some challenges to deal with. Establishing the rules around membership, agreeing on the collective data model (which business elements and attributes, which integrity rules on the data), specifying and realizing security measures and of course designing the architecture and implementing the underlying mechanism for actually recording and sharing | exchanging data.

Architecture and Implementation

The establishment of a collective data set can be regarded as an integration challenge, met in a traditional or more modern way.

First a traditional way: create a central database – managed by a central authority – and create a services layer around that central database. Define the canonical model for the services that support both directions of data flow: submit data (changes) and retrieve current data state. Each member of the ecosystem creates their own implementation of the service consumers. They all commit to submitting data changes to the central data hub and they are free to retrieve data from the collective data set they have an interest in from the services.

image

This approach – not uncommon in recent years and up to today – requires some sort of central hub to be established. This needs to be paid for by the ecosystem members (or third party funding) . The approach makes it hard for members to work locally with an up to date data set. And everyone depends on the availability of the central hub.

An approach that has come into some fashion in recent years is event driven. In this architecture, each company has their own local copy of the collective data set. At any moment in time, they have fresh data, in the type of data store [technology] they like best. For performance and availability, they have no dependency on a central hub. This architecture is founded on a central event platform and canonical event definitions. All parties settle on a set of event types that all will publish and all can consume. An event represents a fact of which one company has become aware and that it shares with the ecosystem. Because all companies can subscribe to all events, they can become aware of te fact as soon as it has been published – or at their convenience at a later time.

image

A challenge in this approach is the definition of the payload of all events: what is the common data model or the common language. And here is also still the need for a central infrastructure: the event platform. Someone has to manage it (cloud be a cloud provider) and it has to be paid for. And of course each participating party has to implement their own connection between local applications and platforms and the central event platform.

Blockchain is a controlled distributed data store

Blockchain can be regarded and described in various ways. One thing a private blockchain is: “a distributed database with strictly enforced integrity rules shared among equal partners or peers”. Blockchain implementations such as Hyperledger Fabric provide an out of the box mechanism for managing a collective data set. The under the hood implementation of the blockchain takes care of distributing transactions across all peers, validating transactions before accepting them and synchronizing local ledgers (aka event stores) across the network of peers.

image

Each peer can have a World State database: a data store (for example implemented in CouchDB or LevelDB) that reflects the current state as the aggregation of all events.

The implementation of a collective data set in an ecosystem of partners has now become fairly straightforward: each party becomes a member in the blockchain by creating their own Hyperledger Fabric peer and adding it to the network. This peer can be installed and configured on premises and much easier provisioned with one of the cloud providers offering managed Hyperledger Fabric blockchain.

This approach does not require any central authority or infrastructure. There is no dependency on a shared infrastructure that needs to managed and paid for. The only real challenge lies (still) in defining the data structures and integrity rules. With blockchain, we call this defining the smart contracts.

Conclusion

Sharing data – making sure that data is at the right place at the right time in the right format to allow value to be derived from that data – is common objective. It is what integration is all about. Sharing data across organizations – and across IT landscapes – is regularly highly desirable. To allow for end-to-end value chains, and for rapid responses across an ecosystem of partners to events as soon as they are known to at least one member of the ecosystem.

Various architectures can be designed for sharing of data between members of an ecosystem. Most traditional approaches rely on central infrastructures that have to be managed and paid for. With the rapid advent of blockchain technology, we now have the opportunity to implement a shared data set across a flexible number of peers with little implementation effort on any peer, without any central infrastructure and with guarantees for equal data positions for all members. New members can be admitted into the ecosystem very easily and temporary absences (technical or business wise) can easily be absorbed.