The glue that integrates all your data components together.
DataOps maps the best practices of Agile methodology, DevOps, and statistical process control (SPC) to data. Whereas DevOps aims to improve the release and quality of software products, DataOps does the same thing for data products.
– Fundamentals of Data Engineering.
DataOps can be split into three main pillars:
The foundations for your data resources. It relates to the deployment of your cloud resources and the different data tools you use. Terraform has become the “de facto” tool for “Infrastructure as Code” and Kubernetes the best open-source system to deploy applications. At Astrafy we always favour open-source applications and deploy those on Kubernetes using helm.
Enables reliability and consistency in the DataOp process and allows data engineers to quickly deploy new products features and improvements to existing workflows. As a rule of thumb, “if something can be automated then it must be automated”. DataOps engineers should constantly seek improvements in automation that will fasten the job of data engineers so that data engineers can focus more on delivering to the business.
Observability and monitoring
“Data is a silent killer”. We have countless of examples where data is inaccurate at datamart level and it remains undetected for a while. Observability and monitoring are all about getting in control of what happens within your data products. Companies must lean towards Data Observability Driven development that can be compared to test-driven development (TDD) in software engineering.
When glued well together, those different pillars of DataOps allow fast iteration for your data and analytics engineers for development and give them also full control over the different data issues that might arise.
Automation requires some basics that are not going anywhere:
GitOps is the foundation of your codebase and each codebase must have a GitOps strategy well defined. Each push on the remote repository is the starting point of the DataOps automation pipeline.
Shell scripting has been there for decades and is still widely used in DevOps and DataOps as a default supported languages for simple operations.
Why Astrafy ?
Astrafy engineers are savvy hands-on experts in DataOps as we consider DataOps to be an inseparable part to Data Engineering. One can not have scalable data products if automation is not present all along the data journey..
Most of us are software engineers with computer science background and have always seen DataOps as a mandatory part when doing a project. We don’t ship data code without test that can be monitored on-the-fly through convenient tools.
Your main advantages working with us for your DataOps journey:
- We have lots of terraform modules and Gitlab CI templates designed for data use cases. This allows us to jump start your DataOps journey.
- We have our own Kubernetes cluster hosted on Google Kubernetes Engine where we have deployed all the Modern Data Stack applications we preach (Airbyte, Airflow, lightdash, dbt, etc.). All this deployment is in terraform so that we can quickly deploy those applications.
- We see the whole picture with the other pillars of data engineering described in the other pages of our stack. DataOps require to understand all the different aspects of data engineering in order to integrate and automate the different tools to work together at the right time.