Role:Data Engineer
Focus:Cloud Architecture
Tools:Spark, Airflow, SQL
Passion:Big Data & Pipelines
Customer data pipeline implementing Bronze, Silver, and Gold medallion architecture in Snowflake with DML operations for incremental loads and historical tracking.
End-to-end pipeline for ingesting, transforming, and analyzing news data in Snowflake with reporting and analytics capabilities.
Batch ETL pipeline for car rental domain data into Snowflake—ingestion, staging, and warehouse layers for analytics and reporting.
Delta Live Tables (DLT) pipeline implementing medallion architecture for healthcare data—Bronze to Gold with quality checks and lineage.
Data warehouse design for travel booking with Type 2 Slowly Changing Dimensions (SCD2) for historical tracking and point-in-time reporting.
Event-driven data pipeline on Databricks for e-commerce events—streaming ingestion, processing, and analytics with scalable architecture.
Change Data Capture (CDC) and streaming analytics pipeline for UPI transactions—real-time ingestion, processing, and analytics.
Cross-platform desktop app for real-time system resource monitoring (CPU, RAM, Storage) built with Electron, React, and TypeScript—featuring interactive Recharts, system tray, and builds for macOS, Windows, and Linux.
Why partition skew can turn a 4-hour job into a 45-minute one—and how to find and fix it. Plus when to cache (and when not to), and how to read the Spark UI like a pro.
ACID on object storage, time travel for 2 a.m. debugging, and schema evolution without breaking pipelines. When to choose Delta over raw Parquet—and when not to.
How partition keys drive both ordering and scale, exactly-once semantics without the headache, and why consumer lag is your best early-warning signal. Tune first, then scale.
Turn data quality into executable checks that run in your pipeline. Custom expectations that matter, integration with DAGs, and how to avoid alert fatigue so people actually act on failures.
Lift-and-shift vs. redesign, where cost surprises really come from, and a phased cutover plan that includes validation and rollback. Use the move as a chance to fix technical debt.
Repeatable, versioned infra for clusters and buckets; secrets in the vault, not in code; and how to catch drift before it becomes a fire. Same code for dev, staging, and prod.