Abhay Dawar | Data Engineer

abhay@portfolio:$ ~ cd ~ ls ./expertise ls ./projects ls ./blogs ls ./deepwiki

Hi, I'm Abhay Dawar

Role:Data Engineer

Focus:Cloud Architecture

Tools:Spark, Airflow, SQL

Passion:Big Data & Pipelines

GitHub github.io/
LinkedIn iamabhaydawar
X @iamabhaydawar
Local Time --:--:--
Location Vancouver, Canada

abhay@portfolio:$ ~ cat /etc/motd

abhay@portfolio:~$

abhay@portfolio : ~ $ ls -la /usr/local/bin/navigation

cd ~ git remote curl -L linkedin wget -O x.com cat ./resume.pdf

Navigation loaded successfully

Ready for commands...

currently in

abhay@portfolio:$ ~ ./show_stack.sh --verbose

Expertise

> [Languages]

Python

SQL

Scala

Java

Bash

> [Big Data & Streaming]

Apache Spark

Apache Kafka

Hadoop Ecosystem

Apache Flink

Databricks

> [Databases & Warehouses]

PostgreSQL

MySQL

Snowflake

AWS Redshift

Google BigQuery

NoSQL (DynamoDB, Cassandra)

> [Cloud Platforms]

AWS

Azure

GCP

> [Tools & Orchestration]

Apache Airflow

Docker

Kubernetes

Terraform

Git / GitHub Actions

CI/CD

> [Data Visualization]

Tableau

Power BI

Looker

Matplotlib/Seaborn

abhay@portfolio:$ ~ ls -l /var/log/projects/

Projects

> [Agentic Engineering Projects]

// DevRadar — Career Intelligence Platform

⟩

AI-powered career intelligence for developers—maps your stack into a live knowledge graph, matches you against startups, surfaces hackathons, runs skill gap analysis, and grounds chat in a personal wiki. HydraDB persists everything across sessions. Built for WikiThon 2026. Live at devradar-seven.vercel.app.

React HydraDB Groq Node.js vis-network AI

// MnemOS — Agentic OS with Memory

⟩

Visual workflow builder for desktop AI agents in a containerized virtual desktop. Agents remember across sessions, recover from failures, and adapt strategy via Remember, Recall, Recover, and Plan nodes—powered by HydraDB, Groq, and Playwright browser automation. Built for the Agents Under Pressure hackathon.

React FastAPI HydraDB Playwright Docker AI Agents

> [Data Engineering Projects]

// Snowflake Customer DML & Medallion Architecture

⟩

Customer data pipeline implementing Bronze, Silver, and Gold medallion architecture in Snowflake with DML operations for incremental loads and historical tracking.

Snowflake Medallion DML Data Modeling SQL

// Snowflake News Data Analysis

⟩

End-to-end pipeline for ingesting, transforming, and analyzing news data in Snowflake with reporting and analytics capabilities.

Snowflake ETL SQL Analytics

// Car Rental Batch Ingestion (Snowflake)

⟩

Batch ETL pipeline for car rental domain data into Snowflake—ingestion, staging, and warehouse layers for analytics and reporting.

Snowflake Batch ETL Data Warehouse SQL

// Healthcare DLT Medallion Pipeline

⟩

Delta Live Tables (DLT) pipeline implementing medallion architecture for healthcare data—Bronze to Gold with quality checks and lineage.

Delta Live Tables Databricks Medallion Healthcare PySpark

// Travel Booking SCD2 Data Warehouse

⟩

Data warehouse design for travel booking with Type 2 Slowly Changing Dimensions (SCD2) for historical tracking and point-in-time reporting.

SCD2 Data Warehouse Data Modeling SQL

// E-Commerce Event-Driven Databricks Pipeline

⟩

Event-driven data pipeline on Databricks for e-commerce events—streaming ingestion, processing, and analytics with scalable architecture.

Databricks Event-Driven Streaming E-Commerce PySpark

// UPI Transactions CDC & Streaming Analytics

⟩

Change Data Capture (CDC) and streaming analytics pipeline for UPI transactions—real-time ingestion, processing, and analytics.

CDC Streaming Real-time Analytics

> [Backend Engineering Projects]

// Daxita — Financial Transaction Analyzer

⟩

FastAPI microservice that accepts financial transactions, computes cash-flow summaries, evaluates risk flags (negative net flow, large outflows, NSF risk), and returns a readiness classification—strong, structured, or requires clarification. Built for the Daxita Backend Engineering Challenge, containerized with Docker.

FastAPI Python Pydantic Docker REST API

> [Experimental Projects]

// Electron System Resource Monitor

⟩

Cross-platform desktop app for real-time system resource monitoring (CPU, RAM, Storage) built with Electron, React, and TypeScript—featuring interactive Recharts, system tray, and builds for macOS, Windows, and Linux.

Electron React TypeScript Vite Recharts

// Self-Driving Car NN (FSD Simulation)

⟩

Full Self-Driving (FSD) simulation using neural networks implemented from scratch in JavaScript—sensors, road/car physics, and training loop with a live demo at self-driving-car-nn.vercel.app.

Neural Networks Machine Learning JavaScript Simulation

abhay@portfolio:$ ~ find ./blogs -name "*.md" -type f

Blogs

// Optimizing Apache Spark Jobs: A Complete Guide

⟩

Why partition skew can turn a 4-hour job into a 45-minute one—and how to find and fix it. Plus when to cache (and when not to), and how to read the Spark UI like a pro.

Apache Spark Performance Optimization Big Data

// Building Modern Data Lakes with Delta Lake

⟩

ACID on object storage, time travel for 2 a.m. debugging, and schema evolution without breaking pipelines. When to choose Delta over raw Parquet—and when not to.

Delta Lake Data Lake AWS ACID

// Real-Time Streaming with Apache Kafka

⟩

How partition keys drive both ordering and scale, exactly-once semantics without the headache, and why consumer lag is your best early-warning signal. Tune first, then scale.

Apache Kafka Streaming Real-time Scalability

// Data Quality at Scale with Great Expectations

⟩

Turn data quality into executable checks that run in your pipeline. Custom expectations that matter, integration with DAGs, and how to avoid alert fatigue so people actually act on failures.

Data Quality Great Expectations Monitoring Airflow

// Cloud Data Warehouse Migration Strategies

⟩

Lift-and-shift vs. redesign, where cost surprises really come from, and a phased cutover plan that includes validation and rollback. Use the move as a chance to fix technical debt.

Migration Snowflake Cloud ETL

// Infrastructure as Code for Data Platforms

⟩

Repeatable, versioned infra for clusters and buckets; secrets in the vault, not in code; and how to catch drift before it becomes a fire. Same code for dev, staging, and prod.

Terraform Infrastructure DevOps Automation

// process complete - crafted by Abhay