I am a Senior Machine Learning Engineer at Keebo.AI, specializing in reinforcement learning for cloud data warehouse optimization. My expertise includes Reinforcement Learning, Machine Learning Theory, Natural Language Processing (NLP), and Large Language Models (LLMs).
I earned my Thesis-Based Master's degree in the Computing Science Department from the University of Alberta, where I was fortunate to be supervised by Csaba Szepesvari in Reinforcement Learning and Bandits. And it's extremely inspiring to be part of AMII and RLAI, where brilliant ideas from exceptional scholars and students abound.
I earned my B.Sc. in Mathematics from The Chinese University of Hong Kong. I was fortunate to be supervised by Anthony So in machine learning research. That experience solidified my decision to make machine learning my lifelong pursuit.
Attended Data Privacy: Foundations and Applications Program under supervision of Or Sheffet
Major : Mathematics (Double streams in pure math & applied math)
Minor : Computer Science
Leveraging Non-uniformity in First-order Non-convex Optimization.
Jincheng Mei*, Yue Gao*, Bo Dai, Csaba Szepesvári, and Dale Schuurmans.
International Conference on Machine Learning (ICML), 2021.
PMLRRobust and Efficient RL Methods Solving Capacitated and Time-Based Vehicle Routing Problem
Yue Gao*, Katrina Hooper*, Chirstophe Pennetier*
Filed a patent, 2024
Designed and deployed a production-scale reinforcement learning optimization system for 2,000+ cloud data warehouses used by top-tier Snowflake enterprise customers, establishing state-of-the-art performance in the cloud warehouse optimization industry and improving average cost savings from 8%→16% while maintaining strict latency SLAs.
Developed a novel reward model and policy learning strategy that significantly improved convergence stability and policy generalization in large-scale offline reinforcement learning, enabling robust deployment across real-world warehouses.
Engineered large-scale offline RL pipelines on GCP (BigQuery, GKE, Trainer Jobs, Airflow) with distributed training and automated evaluation, combining robust theory with reliable production execution.
Developed an adaptive RL aggressiveness control DAG using Airflow, dynamically adjusting model aggressiveness per warehouse based on live production metrics. Improved savings-performance tradeoff through automated, feedback-driven hyperparameter adaptation.
Designed one of the industry's first query routing models (80%+ accuracy) to power Keebo's intelligent query routing, which dynamically directs queries to optimal warehouses. Built the full research pipeline from schema-aware tokenization to UMAP-based embedding diagnostics.
Led ML team sprint planning and internal technical education, delivering presentations on RL fundamentals, MLOps best practices, and architecture deep dives to the broader engineering org.
Spearheaded the transformation of complex supply chain route optimization challenges into robust RL/DL/ML models, leveraging CV and NLP techniques to enhance predictive accuracy and operational efficiency by over 30% on multiple metrics. Developed scalable pipelines and established industry-specific benchmarks
Proposed and developed paper-ready innovative reinforcement learning algorithms, incorporating novel variations of Q-Learning and deep learning approaches
Engineered and optimized robust end-to-end pipelines for multi-modal products, integrating diverse data types and systems to enhance product functionality and user engagement
Led cross-functional teams in the strategic development and delivery of professional presentations and demos
Proposed and validated 4 innovative algorithms for robust risk-sensitive reinforcement learning, tailored for trading markets, achieving a 40% increase in risk-adjusted returns compared to existing models.
Proposed an evaluation method of risk sensitive trading agents using empirical game theory.
Conducted comprehensive simulations using a streamlined trading market model to evaluate and demonstrate a 40% superior performance of our algorithms over conventional methods.
I worked with Professor Or Sheffet on differential privacy. Lying on the intersection between machine learning and computational theory, differential privacy is a mathematically rigorous notation of preserving privacy in data analysis.
I'm also fortunate to work with Professor Csaba Szepesvári on Reinforcement Learning, which I believe, is the correct path towards AGI.
I was fortunate to be supervised by Professor Anthony So on Clustering Problems. Through this research, I found machine learning theory illuminating and made up my mind to take ML theory as my pursuit.
Reinforcement Learning, Large Language Models, Machine Learning Systems, Deep Learning, Scalable AI Infrastructure, Agentic AI
I enjoy singing and playing piano, and I'm also a guitar & drum kit beginner.
I'm a big fan of world geography. I enjoy experiencing different cultures, exploring local attractions, and museums.
Here's my visited map .
I'm an absolute aviation ethusiast. I enjoy flying, meticulously logging details of each flight, watching aviation documentaries, and especially capturing aerial photography. I'm particularly interested in commercial aircraft—I enjoy learning about different models, their specifications, and being able to identify them.
I've flown with 29 different airlines. Here are some of my self-made flightlogs.
Reading is central to my life. I read broadly across philosophy, psychology, science, physics, sociology, and economics. Nowadays, most of my reading time goes to ML/RL research papers, which I find deeply engaging. My blog features reading summaries and reflections on academic papers.
I have a huge passion for photographing all kinds of skies and natural scenery.
Here is my sky and nature album.