Senior Data Engineer

Harish
Devathraj

Building Azure data platforms_

6+ years engineering scalable ETL pipelines, cloud data platforms, and analytics solutions on Azure and Databricks. Currently at Tredence, delivering enterprise-scale data integration across supply chain systems end-to-end.

View Projects Get In Touch

Years Exp

Rows Processed

Data Accuracy

10+

Data Sources

adf-pipeline.log

$ databricks run --job ddh_nextgen

✓ SAP ingestion complete

rows: 15,000,000,000

✓ PySpark transforms applied

null_rate: 0.02%

accuracy: 99.9%

✓ Synapse Analytics loaded

✓ ICEDQ validation passed

$ monitoring 20+ KPIs via Grafana...

SAP / Teradata

→

Databricks

→

Synapse

ADF Pipelines

→

ICEDQ DQ

→

Power BI

about

Who I Am

Data Engineering professional with 6+ years of hands-on experience building ETL pipelines, cloud data platforms, and analytics solutions on Azure and Databricks. Proven ability to lead engineering teams, establish best practices, and deliver end-to-end data solutions that drive measurable business impact.

JAN 2024 — PRESENT

Senior Data Engineer

Tredence · Bangalore, India

↳ Supply Chain  Apr 2026 – Present

↳ DDH NextGen  Sep 2025 – Apr 2026

JUN 2022 — JAN 2024

Data Engineer

Capgemini · National Grid · Houston, TX

NOV 2020 — JUN 2022

Data Engineer

Mindtree Ltd · Seattle, WA

AUG 2018 — MAY 2020

MS, Computer Science

University of Houston, TX

✉ devathrajharish@gmail.com

📍 Bangalore, India

🎓 MS CS · Univ. of Houston

Key Achievements

40%

Faster Pipelines

99.9%

Accuracy

75%

Cost Savings

95%+

Anomalies Caught

experience

Work Experience

Tredence Jan 2024 – Present · Bangalore, India

Supply Chain · Current

Supply Chain Data Integration Platform

Engineering high-throughput ingestion pipelines processing 15B+ rows across supply chain systems — integrating REST APIs, SAP ECC, Teradata, and Oracle into a unified lakehouse. Contributing to data modeling tasks including STTM design and data exploration to accelerate reliable analytics delivery.

SAP ECC Oracle Teradata REST APIs PySpark Azure Databricks STTM Design

Azure · Databricks

DDH NextGen — Digital Data Hub

Led a team of 6 engineers to design and deliver the Digital Data Hub for enterprise subscribers. Owned sprint planning, code reviews, and technical decision-making to ship all milestones on schedule. Consolidated subscriber data across SAP, Teradata, and CSV sources on Azure Databricks.

PySpark Azure Databricks ADF SAP Teradata Synapse

Data Quality

Automated DQ & Observability Framework

Built an automated data quality validation framework using ICEDQ, catching 95%+ of anomalies before reaching production. Designed Grafana dashboards tracking 20+ pipeline KPIs for real-time monitoring and faster incident response.

ICEDQ Grafana Python Azure DevOps

Gen AI · RAG POC

AI Pipeline Failure Detection & Auto-Resolution

Built an AI-powered platform that transforms reactive pipeline firefighting into proactive self-healing. Detects anomalies in real time, performs automated root cause analysis on scattered logs, and applies fixes autonomously — eliminating alert fatigue and reducing MTTR for data engineers.

RAG LLM Python Anomaly Detection Conversational AI

Read Article →

Capgemini Jun 2022 – Jan 2024 · Houston, TX

REST APIs · National Grid

National Grid Integration Layer

Built end-to-end Azure pipelines on Databricks to consolidate National Grid's energy systems data. Deployed 5+ REST APIs via Azure Function Apps integrating Power Plan, Datahub, and Copperleaf — delivering a 75% reduction in integration costs.

Function Apps Databricks Python Power BI SQL

Mindtree Nov 2020 – Jun 2022 · Seattle, WA

Big Data · Hadoop

Big Data Analytics Platform

Executed MapReduce jobs over a 200,000-file dataset using PySpark on a Hadoop cluster. Optimized storage by serializing results to Avro and Parquet formats with Snappy compression — significantly reducing disk usage while improving query performance.

PySpark Hadoop MapReduce HDFS Parquet Avro

Harish
Devathraj

Who I Am

Data Toolkit

Certified

Work Experience

Initialize Connection

HarishDevathraj

Who I Am

Data Toolkit

Certified

Work Experience

Initialize Connection

Harish
Devathraj