On the Integrity of Data

To be useful, data held and used in information systems has to live up to a number of expectations. The data should be an accurate representation of its source. It should be reliable. The data should have internal consistency. The data should adhere to rules based on the logic of the real world. This accuracy, internal quality, and reliability of data is frequently referred as data integrity.

Safeguarding the integrity of data is a challenge, one that increases in complexity when multiple users access and manipulate the data simultaneously , obviously a common situation. And that challenge reaches new heights when the data is managed in multiple independent data stores rather than a single database.

Figure 12

Earlier this month, the Oracle Technology Network published an article that I recently wrote on this subject: http://www.oracle.com/technetwork/articles/soa/jellema-data-integrity-1932181.html. I was triggered into writing it by two recent experiences.

One was at a customer of mine where we are designing a service oriented architecture, based on a number of distinct and independent data domains. These domains are exposed through elementary (entity) services. A second tier of composite (or business) services exposes functionality that may involve multiple data domains. We have had and are still having discussion about how to implement data integrity constraints and how to manage transactions that span across data domains. In order to ensure we all had the same understanding of what exactly the challenges are, I decided to record my understanding of integrity, constraints and transactions.

The second situation was at a different customer. There they stated that data quality and absolute robustness of the enterprise database was essential. And they went on to explain how they had implemented their integrity enforcement using Java based logic. Their implementation was elaborate and impressive – but not robust. They would enforce attribute and record level constraints just fine, but any constraint involving multiple records was not enforced with the rigor they needed and thought they had achieved. They had forgotten to properly take into account the multi-user/multi-session environment in which their logic would be used (as well as the other entry point into the database that completely by-passed their business logic). Here again I was compelled into writing down what enforcing integrity entails, specifically the need for locking in a multi-session environment.

I hope the article that was triggered by these two cases – and many cases before that – will help other organizations and teams as well, in understanding what data integrity and enforcing constraints in a truly robust way entails. Frequently, they will find that the current implementation is not in fact robust. For example many organizations using PL/SQL and Trigger based ‘business rule enforcement’ have not implemented a proper locking mechanism and are therefore not as well protected against data corruption as they typically think they are.

The article introduces some of the basics of data integrity enforcement. It then discusses how, in a multi-user environment with a single [Oracle] database, that integrity is realized. The article then extends that discussion to an environment with multiple data stores that jointly present unified data services to consumers – and that are together responsible for data integrity. Not just for the internal, encapsulated integrity within each store, but also for integrity across those stores.

Find the article at OTN: http://www.oracle.com/technetwork/articles/soa/jellema-data-integrity-1932181.html.

Other, related stories that may be worthwhile to take a look at:

How Oracle Database uses internal locks to make statement level constraint validation robust at the transaction level  https://technology.amis.nl/2013/02/28/how-oracle-database-uses-internal-locks-to-make-statement-level-constraint-validation-robust-at-the-transaction-level/

RuleGen 3.0 – the latest, leanest and most robust solution for complex data constraints in an Oracle DatabaseRuleGen 3.0 – the latest, leanest and most robust solution for complex data constraints in an Oracle Database https://technology.amis.nl/2011/07/06/rulegen-3-0-the-latest-leanest-and-most-robust-solution-for-complex-data-constraints-in-an-oracle-database/

The Hunt for the Oracle Database On Commit trigger – https://technology.amis.nl/2006/08/07/the-hunt-for-the-oracle-database-on-commit-trigger/

The future of CDM RuleFrame – the Oracle framework for server-side business rules – https://technology.amis.nl/2004/09/11/the-future-of-cdm-ruleframe-the-oracle-framework-for-server-side-business-rules/