MOC - Data Engineering

Overview

About

This note serves as an index for all notes related to Data Engineering - building systems for collecting, storing, and analyzing data at scale.

Core Areas

Data Pipelines

  • ETL/ELT processes
  • Workflow orchestration (Airflow, Prefect)
  • Stream processing

Data Storage

  • Data warehouses and lakes
  • Distributed storage systems
  • Data formats (Parquet, Avro)

Infrastructure

  • Container orchestration
  • Infrastructure as Code
  • Monitoring and observability

Data Quality

  • Data validation
  • Testing pipelines
  • Documentation and lineage

Parent/Broader MOCs

Sibling MOCs (Same Level)

Child/Specialized MOCs

Language-Specific MOCs

  • MOC - Python - Primary data engineering language (Airflow, dbt, Polars)
  • MOC - R - Analytics and pipeline integration (targets, arrow)

Notes

NOTE

Currently, there are individual notes with the #Topic/Data Engineering tag.


Appendix

Note created on 2025-12-31 and last modified on 2025-12-31.


(c) No Clocks, LLC | 2025