Initializing system...
Abhay Dawar

Hi, I'm Abhay Dawar

Role:Data Engineer

Focus:Cloud Architecture

Tools:Spark, Airflow, SQL

Passion:Big Data & Pipelines

abhay@portfolio:$ ~ cat /etc/motd
abhay@portfolio:~$
abhay@portfolio : ~ $ ls -la /usr/local/bin/navigation
Navigation loaded successfully
Ready for commands...
currently in
abhay@portfolio:$ ~ ./show_stack.sh --verbose
Expertise

> [Languages]

Python
SQL
Scala
Java
Bash

> [Big Data & Streaming]

Apache Spark
Apache Kafka
Hadoop Ecosystem
Apache Flink
Databricks

> [Databases & Warehouses]

PostgreSQL
MySQL
Snowflake
AWS Redshift
Google BigQuery
NoSQL (DynamoDB, Cassandra)

> [Cloud Platforms]

AWS
Azure
GCP

> [Tools & Orchestration]

Apache Airflow
Docker
Kubernetes
Terraform
Git / GitHub Actions
CI/CD

> [Data Visualization]

Tableau
Power BI
Looker
Matplotlib/Seaborn
abhay@portfolio:$ ~ ls -l /var/log/projects/
Projects

// Snowflake Customer DML & Medallion Architecture

Customer data pipeline implementing Bronze, Silver, and Gold medallion architecture in Snowflake with DML operations for incremental loads and historical tracking.

Snowflake Medallion DML Data Modeling SQL

// Snowflake News Data Analysis

End-to-end pipeline for ingesting, transforming, and analyzing news data in Snowflake with reporting and analytics capabilities.

Snowflake ETL SQL Analytics

// Car Rental Batch Ingestion (Snowflake)

Batch ETL pipeline for car rental domain data into Snowflake—ingestion, staging, and warehouse layers for analytics and reporting.

Snowflake Batch ETL Data Warehouse SQL

// Healthcare DLT Medallion Pipeline

Delta Live Tables (DLT) pipeline implementing medallion architecture for healthcare data—Bronze to Gold with quality checks and lineage.

Delta Live Tables Databricks Medallion Healthcare PySpark

// Travel Booking SCD2 Data Warehouse

Data warehouse design for travel booking with Type 2 Slowly Changing Dimensions (SCD2) for historical tracking and point-in-time reporting.

SCD2 Data Warehouse Data Modeling SQL

// E-Commerce Event-Driven Databricks Pipeline

Event-driven data pipeline on Databricks for e-commerce events—streaming ingestion, processing, and analytics with scalable architecture.

Databricks Event-Driven Streaming E-Commerce PySpark

// UPI Transactions CDC & Streaming Analytics

Change Data Capture (CDC) and streaming analytics pipeline for UPI transactions—real-time ingestion, processing, and analytics.

CDC Streaming Real-time Analytics

// Electron System Resource Monitor

Cross-platform desktop app for real-time system resource monitoring (CPU, RAM, Storage) built with Electron, React, and TypeScript—featuring interactive Recharts, system tray, and builds for macOS, Windows, and Linux.

Electron React TypeScript Vite Recharts

// Self-Driving Car NN (FSD Simulation)

Full Self-Driving (FSD) simulation using neural networks implemented from scratch in JavaScript—sensors, road/car physics, and training loop with a live demo at self-driving-car-nn.vercel.app.

Neural Networks Machine Learning JavaScript Simulation
abhay@portfolio:$ ~ find ./blogs -name "*.md" -type f
Blogs

Why partition skew can turn a 4-hour job into a 45-minute one—and how to find and fix it. Plus when to cache (and when not to), and how to read the Spark UI like a pro.

Apache Spark Performance Optimization Big Data

ACID on object storage, time travel for 2 a.m. debugging, and schema evolution without breaking pipelines. When to choose Delta over raw Parquet—and when not to.

Delta Lake Data Lake AWS ACID

How partition keys drive both ordering and scale, exactly-once semantics without the headache, and why consumer lag is your best early-warning signal. Tune first, then scale.

Apache Kafka Streaming Real-time Scalability

Turn data quality into executable checks that run in your pipeline. Custom expectations that matter, integration with DAGs, and how to avoid alert fatigue so people actually act on failures.

Data Quality Great Expectations Monitoring Airflow

Lift-and-shift vs. redesign, where cost surprises really come from, and a phased cutover plan that includes validation and rollback. Use the move as a chance to fix technical debt.

Migration Snowflake Cloud ETL

Repeatable, versioned infra for clusters and buckets; secrets in the vault, not in code; and how to catch drift before it becomes a fire. Same code for dev, staging, and prod.

Terraform Infrastructure DevOps Automation