Hi, I'm Dehao Zhang
Data Scientist & AI Builder
I optimize business efficiency and enhance decision-making through the power of data science and AI. Currently building intelligent solutions at Microsoft.
Featured Projects
Building end-to-end data pipelines and implementing advanced AI architectures to solve real-world challenges.
EcoPulse
An end-to-end batch data pipeline designed to ingest, process, and analyze key economic indicators. Built using a cloud-native architecture that handles the transition from raw data to actionable business intelligence.
ToSDR-RAG
A sophisticated Retrieval-Augmented Generation (RAG) system based on the "Terms of Service; Didn't Read" dataset. Utilizes vector embeddings and semantic search to query complex legal documents in natural language.
Latest Insights
Documenting my journey through the modern data stack — from cloud infrastructure to machine learning operations.
Building EcoPulse: A Scalable Batch Pipeline for Economic Data
A deep dive into building a scalable batch pipeline for economic data, focusing on end-to-end automation with cloud-native tools.
Mastering Kestra: Workflow Orchestration for Modern Data Pipelines
A comprehensive guide on workflow orchestration, from Postgres ETL to managing GCS and BigQuery pipelines.
Spark Internals: Why Spark Outperforms Hadoop
Analysis of cluster architecture and why Spark outperforms Hadoop for modern big data workloads.
Core Competencies
AI & ML
- LLMs
- RAG
- NLP
- Predictive Modeling
Data Engineering
- Spark
- BigQuery
- dbt
- Airflow
Cloud & DevOps
- Azure
- GCP
- Terraform
- Docker
Analytics
- Python
- SQL
- Power BI
- Statistics