Data Science & Engineering

Data engineering refers to the process of collecting, transforming, and organizing raw data into a usable format for data analysis, business intelligence (BI), or machine learning (ML). The primary goal of data engineering is to ensure that data is accessible, clean, and reliable so that it can be used to extract valuable insights or support decision-making processes.

This involves several steps:


Data Ingestion: Data ingestion is the process of bringing data from one or more data sources into a data platform. These data sources can be files stored on-premises or on cloud storage services, databases, applications, and increasingly — data streams that produce real-time events.


Data Transformation: Data transformation takes raw ingested data and uses a series of steps (referred to as “transformations”) to filter, standardize, clean, and finally aggregate it so it’s stored in a usable way. A popular pattern is the medallion architecture, which defines three stages in the process — Bronze, Silver, and Gold.


Data Orchestration and Pipelines: Data orchestration refers to the way a data pipeline that performs ingestion and transformation is scheduled and monitored as well as the control of the various pipeline steps and handling failures (e.g., by executing a retry run)