Have your data engineering right before plumbing your data into activation.
Data engineering is the development, implementation, and maintenance of systems and processes that take in raw data and produce high-quality, consistent information that supports downstream use cases, such as analysis and machine learning. Data Engineering is the intersection of security, data management, DataOps, Data Architecture, orchestration and software engineering. A data engineer manages the data engineering lifecycle, beginning with getting data from source systems and ending with serving data for use cases, such as analysis or machine learning. – Fundamentals of Data Engineering.
At Astrafy we focus on the following four activities when talking about data engineering. We have dedicated sections for DataOps and data governance.
It all starts with the holistic view of defining the best architecture for your data ecosystem. We start by analysing your current stack, define gap analysis and materialise all thai into an optimal “TO BE” architecture. Data tools, batch versus streaming, data team structure are some of the aspects discussed at this high level architecture.
Process of bringing data into a system for processing or storage. We are experts in ingesting data from any kind of sources (either database or application) and syncing it into many different sinks (BigQuery, Snowflake, etc.). On ingesting internal data, we make sure to develop data contracts between your software engineer team and your data team.
Refers to structuring data into something that makes sense for the business. It is one of the most underrated activities in data engineering nowadays. Instead of quick SQL “on the fly”, one should take time to model its data into relevant data modelling techniques. At Astrafy we are experts in Data Vault 2.0, dimensional modelling and activity schema (full denormalization).
Cornerstone of data engineering. data transformations translate raw data from source systems into a well defined data model that then serves downstream applications (BI dashboards, ML models, etc.). Our team excels at dbt in order to transform in a structured way your data. In the case of streaming analytics where real-time is mandatory for your use case we leverage Apache Beam.
It is the process of coordinating many jobs to run as quickly and efficiently as possible on a scheduled cadence. Orchestration is at the intersection of Data Engineering and DataOps as it allows automation in your data transformations at well defined interval. Airflow is the most adopted orchestration tool and allows for advanced DAGs definitions to orchestrate diverse operations based on customised logic.
Astrafy engineers are hands-on experts in data engineering through years of accumulated experience on large scale projects. We are experts and partners on the most used data engineering tools on the market.
We never reinvent the wheel and leverage frameworks as much as possible; that being said we are great coders and can develop from scratch or collaborate to OSS libraries in the following languages:
We are all about Modern Data Stack and the new data paradigms. In that sense we are favouring a Data Mesh approach when recommending new data architecture. We are convinced that data data is better managed when it is treated as a product.
- Data Engineering is what defines us.
- We have founded Astrafy as we noticed that data engineering was most of the time expedited in order to serve quickly downstream applications. This shortcut leads in the long term to unscalable and unmaintainable data platforms where transformations get added on top of each other with no specific logic.
- We proudly advocate ourselves as data engineers experts that take the time to define a holistic view of your data ecosystem, followed by thorough design of the most efficient “TO BE” before putting our hands dirty with implementation.
- We don’t consider data engineering as an isolated activity. We see it in the context of our multiple data expertise (intertwined with DataOps, DevSecOps, Data governance, etc.) and exponential added values come out from this intersection of different disciplines.