About Stack Experience Contact Certifications Hire Me
Senior Data Engineer

Harish
Devathraj

Building Azure data platforms_

6+ years engineering scalable ETL pipelines, cloud data platforms, and analytics solutions on Azure and Databricks. Currently at Tredence, delivering enterprise-scale data integration across supply chain systems end-to-end.

0
Years Exp
0B
Rows Processed
0%
Data Accuracy
10+
Data Sources

Who I Am

Data Engineering professional with 6+ years of hands-on experience building ETL pipelines, cloud data platforms, and analytics solutions on Azure and Databricks. Proven ability to lead engineering teams, establish best practices, and deliver end-to-end data solutions that drive measurable business impact.

JAN 2024 — PRESENT
Senior Data Engineer
Tredence · Bangalore, India
↳ Supply Chain  Apr 2026 – Present
↳ DDH NextGen  Sep 2025 – Apr 2026
JUN 2022 — JAN 2024
Data Engineer
Capgemini · National Grid · Houston, TX
NOV 2020 — JUN 2022
Data Engineer
Mindtree Ltd · Seattle, WA
AUG 2018 — MAY 2020
MS, Computer Science
University of Houston, TX
Harish Devathraj
devathrajharish@gmail.com
📍 Bangalore, India
🎓 MS CS · Univ. of Houston
Key Achievements
40%
Faster Pipelines
99.9%
Accuracy
75%
Cost Savings
95%+
Anomalies Caught

Data Toolkit

Cloud & Platforms
Azure Databricks AWS Snowflake Airflow
Big Data & Processing
Apache Spark Hadoop MapReduce HDFS
Languages & Databases
Python PySpark SQL PostgreSQL MySQL Teradata SAP
BI, Monitoring & DevOps
Power BI Grafana Tableau ICEDQ Docker Azure DevOps CI/CD Git
AI & Gen AI
Gen AI RAG LLM GitHub Copilot Claude

Certified

Databricks DE Professional Databricks DE Associate Certification Certification Certification Certification Certification Certification Certification Certification Certification

Work Experience

Tredence Jan 2024 – Present · Bangalore, India
Supply Chain · Current
Supply Chain Data Integration Platform

Engineering high-throughput ingestion pipelines processing 15B+ rows across supply chain systems — integrating REST APIs, SAP ECC, Teradata, and Oracle into a unified lakehouse. Contributing to data modeling tasks including STTM design and data exploration to accelerate reliable analytics delivery.

SAP ECC Oracle Teradata REST APIs PySpark Azure Databricks STTM Design
Azure · Databricks
DDH NextGen — Digital Data Hub

Led a team of 6 engineers to design and deliver the Digital Data Hub for enterprise subscribers. Owned sprint planning, code reviews, and technical decision-making to ship all milestones on schedule. Consolidated subscriber data across SAP, Teradata, and CSV sources on Azure Databricks.

PySpark Azure Databricks ADF SAP Teradata Synapse
Data Quality
Automated DQ & Observability Framework

Built an automated data quality validation framework using ICEDQ, catching 95%+ of anomalies before reaching production. Designed Grafana dashboards tracking 20+ pipeline KPIs for real-time monitoring and faster incident response.

ICEDQ Grafana Python Azure DevOps
Gen AI · RAG POC
AI Pipeline Failure Detection & Auto-Resolution

Built an AI-powered platform that transforms reactive pipeline firefighting into proactive self-healing. Detects anomalies in real time, performs automated root cause analysis on scattered logs, and applies fixes autonomously — eliminating alert fatigue and reducing MTTR for data engineers.

RAG LLM Python Anomaly Detection Conversational AI
Read Article
Capgemini Jun 2022 – Jan 2024 · Houston, TX
REST APIs · National Grid
National Grid Integration Layer

Built end-to-end Azure pipelines on Databricks to consolidate National Grid's energy systems data. Deployed 5+ REST APIs via Azure Function Apps integrating Power Plan, Datahub, and Copperleaf — delivering a 75% reduction in integration costs.

Function Apps Databricks Python Power BI SQL
Mindtree Nov 2020 – Jun 2022 · Seattle, WA
Big Data · Hadoop
Big Data Analytics Platform

Executed MapReduce jobs over a 200,000-file dataset using PySpark on a Hadoop cluster. Optimized storage by serializing results to Avro and Parquet formats with Snappy compression — significantly reducing disk usage while improving query performance.

PySpark Hadoop MapReduce HDFS Parquet Avro

Initialize Connection

Open to new opportunities, collaborations, and interesting data challenges. Whether you need an expert to architect your next data platform or lead your data engineering team — let's talk.

harish@portfolio:~$ send_message --init_
--name
--email
--message
✓ Message sent. I'll be in touch soon.