The glue that integrates all your data components together.

banner_image5 banner_image5_mobile

DataOps can be split into three main pillars:

Data Infrastructure

The foundations for your data resources. It relates to the deployment of your cloud resources and the different data tools you use. Terraform has become the “de facto” tool for “Infrastructure as Code” and Kubernetes the best open-source system to deploy applications. At Astrafy we always favour open-source applications and deploy those on Kubernetes using helm.

image_30
image_31
Automation

Enables reliability and consistency in the DataOp process and allows data engineers to quickly deploy new products features and improvements to existing workflows. As a rule of thumb, “if something can be automated then it must be automated”. DataOps engineers should constantly seek improvements in automation that will fasten the job of data engineers so that data engineers can focus more on delivering to the business.

image_32
image_33
Observability and monitoring

“Data is a silent killer”. We have countless of examples where data is inaccurate at datamart level and it remains undetected for a while. Observability and monitoring are all about getting in control of what happens within your data products. Companies must lean towards Data Observability Driven development that can be compared to test-driven development (TDD) in software engineering.

image_34
image_35
image_36
image_37-1

Automation requires some basics that are not going anywhere:

GitOps is the foundation of your codebase and each codebase must have a GitOps strategy well defined. Each push on the remote repository is the starting point of the DataOps automation pipeline.

image_38

Shell scripting has been there for decades and is still widely used in DevOps and DataOps as a default supported languages for simple operations.

image_39

Why Astrafy ?

Astrafy engineers are savvy hands-on experts in DataOps as we consider DataOps to be an inseparable part to Data Engineering. One can not have scalable data products if automation is not present all along the data journey..

Most of us are software engineers with computer science background and have always seen DataOps as a mandatory part when doing a project. We don’t ship data code without test that can be monitored on-the-fly through convenient tools.

Your main advantages working with us for your DataOps journey:

image_28
  • We have lots of terraform modules and Gitlab CI templates designed for data use cases. This allows us to jump start your DataOps journey.
  • We have our own Kubernetes cluster hosted on Google Kubernetes Engine where we have deployed all the Modern Data Stack applications we preach (Airbyte, Airflow, lightdash, dbt, etc.). All this deployment is in terraform so that we can quickly deploy those applications.
  • We see the whole picture with the other pillars of data engineering described in the other pages of our stack. DataOps require to understand all the different aspects of data engineering in order to integrate and automate the different tools to work together at the right time.

Start automating everything around your data.