“Data contracts are like bridges, connecting disparate islands of information; their strength lies not in their existence, but in their precision, clarity, and mutual understanding.”
In our contemporary, data-driven business environment, the discipline of data management is continually evolving, and organizations are actively seeking ways to improve the integrity, availability, and reliability of their data. In this context, the ‘data contracts’ concept emerges as a powerful tool. Data Contract is an agreement between Data Producer and Data Consumer regarding the data provided. It can refer to various features of the data like schema, values contained in the data, or the timeliness of it. This series explores how data contracts can be implemented using Data Build Tool (dbt), Google Cloud, and Great Expectations.
This series is split into three parts:
Part 1 provides a high-level overview of Data Contracts subsystems and a gentle introduction to all of the technologies that will be used throughout this series.
Part 2 focuses on the implementation of the Data Contract repository and how Data Producers and Consumers collaborate to make sure the data is of the highest quality.
Part 3 features an in-depth look at executing Data Contract checks in runtime and explores their outcomes.
Data Contracts stand on top of various technology stacks and architectures developed in the last years. This means that talking about Data Contracts inevitably brings in a lot of new concepts and frameworks. In the later part of the article, you can find the Glossary of the terms used with links for further exploration of the concepts.
This series will focus on Data Contracts that are external to any one system and can be extended with additional functionality depending on the needs of your organization. It is another iteration of the data contract system, therefore a lot of the infrastructure components you may find familiar if you have already read our previous article about Data Contracts. It describes in detail some of the aspects that will be covered here only briefly so make sure to take a look!
High-Level Architecture of the Data Contract System
Data Contracts are being fleshed out in the industry and there are different approaches to building a system using Data Contracts. No matter the details though, from a high-level viewpoint, any system capable of performing Data Contracts work will need two parts:
Data Contracts Management Layer
Data Contracts Execution Layer