Pham Tien Phat
const developer = "Phạm Tiến Phát"|

I build
data systems
that scale.

Data Engineering student crafting ETL pipelines, ML models, and scalable data solutions.Based in Ho Chi Minh City, Vietnam.

PythonPandasNumPySQLETLLightGBMXGBoostCatBoostOptunaScikit-learnJupyterGitSoliditySQLiteFeature EngineeringMatplotlibDockerNext.jsTypeScriptReactRedpandaKafkaTimescaleDBdbtGrafanaWebSocketPythonPandasNumPySQLETLLightGBMXGBoostCatBoostOptunaScikit-learnJupyterGitSoliditySQLiteFeature EngineeringMatplotlibDockerNext.jsTypeScriptReactRedpandaKafkaTimescaleDBdbtGrafanaWebSocket

Passionate about turning data into decisions

I'm a Data Science student at VNUHCM — University of Science, with a deep focus on Data Engineering and Machine Learning. I love building systems that move, transform, and unlock the value of data at scale.

From designing modular ETL pipelines and predictive ML models to experimenting on Kaggle and exploring blockchain data storage — I approach every project with engineering rigour and a bias for clean, scalable solutions. I'm actively seeking a Data Engineering Internship where I can contribute to real data infrastructure while growing fast.

Ho Chi Minh City, VietnamData Science StudentSeeking DE InternshipKaggle Competitor
5+
Projects Built
3+
Kaggle Competitions
ETL
Pipeline Design
ML
Model Builder

My Focus

Building reliable, scalable data infrastructure — ETL pipelines, data models, and ML systems that teams can trust and build upon.

Technical Expertise

Languages & Core

Python
Advanced
SQL
Proficient
TypeScript
Familiar
Solidity
Familiar

Data & ETL

Pandas
Advanced
ETL Pipelines
Proficient
Data Modelling
Proficient
SQLite / SQL DBs
Proficient

ML & Analytics

Scikit-learn
Proficient
LightGBM / XGBoost
Proficient
Feature Engineering
Proficient
Time-Series Analysis
Familiar

Tools & Others

Jupyter Notebook
Advanced
Git / GitHub
Proficient
Optuna
Familiar
Blockchain / Web3
Familiar
PythonPandasNumPySQLETLLightGBMXGBoostCatBoostOptunaScikit-learnJupyterGitSoliditySQLiteFeature EngineeringMatplotlibDockerNext.jsTypeScriptReactRedpandaKafkaTimescaleDBdbtGrafanaWebSocketPythonPandasNumPySQLETLLightGBMXGBoostCatBoostOptunaScikit-learnJupyterGitSoliditySQLiteFeature EngineeringMatplotlibDockerNext.jsTypeScriptReactRedpandaKafkaTimescaleDBdbtGrafanaWebSocket
Data Pipeline

Crypto Stream Pipeline

Real-time cryptocurrency data pipeline — Binance WebSocket → Redpanda → TimescaleDB → dbt → Grafana, with Telegram alerts.

  • Real-time ingestion of BTC, ETH, SOL, BNB trades via Binance WebSocket
  • Kafka-compatible streaming with Redpanda, batch writes (200 records / 3s)
  • TimescaleDB hypertables with auto continuous aggregates for 1m & 1h OHLCV
  • 3-layer dbt models: staging, intermediate (candles), marts (volatility, volume)
  • Telegram alerts for >3% price change and >3σ volume spikes
  • One-command deployment with Docker Compose
PythonRedpandaTimescaleDBdbtGrafanaDockerWebSocketTelegram BotGitHub Actions
Data Pipeline

Event Analytics Pipeline

Python ETL workflow for ingesting, validating, and structuring event data into analytics-ready tables.

  • Designed modular ingest, transform, validate, and load stages
  • Handled malformed records with structured logging and safer fallbacks
  • Prepared clean output schemas for downstream analytics and reporting
PythonPandasETLData PipelineJSONSQLite
Machine Learning

Gold Price Forecast

Time-series forecasting project that predicts gold prices using engineered features and ensemble models.

  • Engineered lag, rolling, and trend-based time-series features
  • Benchmarked multiple models before selecting the strongest ensemble
  • Compared predictions against actual prices with visual evaluation
PythonScikit-learnPandasNumPyMatplotlibMachine Learning
System Design

SME Management System

Python business system with inventory, finance, and reporting workflows for small and medium enterprises.

  • Designed relational data structures for inventory, finance, and staff
  • Generated operational reports from aggregated transactional records
  • Separated business rules from persistence logic for maintainability
PythonSQLiteData ModellingCRUDReporting
Blockchain

Kickstarter DApp Contract

Decentralized crowdfunding contract that demonstrates backend logic, state management, and blockchain fundamentals.

  • Implemented deterministic contribution and refund state transitions
  • Protected withdrawal logic behind campaign goal requirements
  • Optimized storage layout with attention to gas costs
SolidityEthereumSmart ContractsWeb3Blockchain
Kaggle

Kaggle Competition Portfolio

Collection of Kaggle notebooks exploring feature engineering, model selection, and competition-style evaluation.

  • Built ensemble workflows with LightGBM, XGBoost, and CatBoost
  • Used Optuna to tune models against stronger validation metrics
  • Practiced rapid iteration on unfamiliar datasets and problem types
PythonLightGBMXGBoostCatBoostOptunaScikit-learn

My Journey

High School Diploma — Mathematics

Nguyen Quang Dieu High School for the Gifted

2021 – 2024

Specialized in Mathematics at one of the top gifted high schools in Dong Thap province. Built strong analytical thinking and problem-solving foundations.

MathematicsAnalytical ThinkingProblem Solving

Head of Human Resources

NQD Connection Volunteer Club

2022 – 2024

Led HR operations and managed recruitment, training, and engagement for 200+ members. Designed tracking systems to monitor member performance and participation metrics.

Recruitment & Training200+ MembersPerformance TrackingLeadership

Bachelor of Data Science

VNUHCM — University of Science

2024 – Present

Studying Data Science with focus on data analysis, machine learning, and software engineering. Building real-world projects including ETL pipelines, ML models, and blockchain smart contracts.

Data AnalysisMachine LearningPythonSQL

Software Engineer — Intern

Peganyx · Remote

Nov 2025 – Present

Collaborating in a remote team to develop web applications and blockchain features. Building data processing scripts for performance monitoring and reporting. Working in agile workflows with Git-based project delivery.

Web DevelopmentBlockchainData ProcessingAgile / Git

What I've Accomplished

Active Kaggle Competitor

Competing in real-world ML competitions including irrigation prediction and classification challenges. Currently building a stacked ensemble model targeting top-tier scores.

3+
Competitions
10+
Notebooks
Kaggle Profile

Consistent Open Source Builder

5+ production-quality repositories spanning data pipelines, ML models, system design, and blockchain contracts. Continuous commits demonstrating real engineering discipline.

5+
Repositories
Data Eng
Focus
GitHub Profile

Self-Directed Learning

From ETL pipelines to smart contracts — I build projects that go beyond tutorials. Each one solves a real problem and demonstrates production-oriented thinking.

4+
Tech Domains
10K+
Lines of Code
View Projects

Let's build something together

I'm actively seeking a Data Engineering Internship and open to any data-driven role where I can contribute and grow. Whether it's about a job opportunity, a collaboration, or just to connect — I'd love to hear from you.

Available for Internship
Ho Chi Minh City, Vietnam · Remote OK