Data Systems
Data Engineering
Focus Areas
- Spark architecture and execution
- MapReduce fundamentals
- Data Mesh and domain-owned data products
- Real-time Analytics and event-driven decision systems
- Batch pipelines
- Streaming pipelines
- Partitioning, shuffling, joins, and skew
- File formats and table formats
- Orchestration and reliability
Key Questions
- Where does data enter and leave the system?
- What are the latency, correctness, and cost requirements?
- How are failures retried and observed?
- What happens when data is late, duplicated, or malformed?