Yunshun Zhong
Machine Learning Engineer & Data Scientist
From research to deployment — building ML systems that work in the real world.
With hands-on experience from internships and published research, I'm passionate about solving real-world problems using data, code, and curiosity.


Research & Publications
Domain-specific language models pre-trained on construction management systems corpora
Automation in Construction, 160, 105316.
Builds the first CMS domain corpus and pre-trains BERT/RoBERTa variants, then fine-tunes them on text classification and NER tasks. Achieves 5.9 % and 8.5 % F1-score gains over general PLMs, proving the value of domain adaptation for construction-focused NLP.
Zhong, Y., & Goodfellow, S. D. (2024)
An Agency-specific Project Authoring Advisor: An LLM-based RAG System.
Zhong, Y., & El-Diraby, T. (2025)
Advanced Engineering Informatics (under review)
Leveraging Large Language Models with Retrieval-Augmented Generation for Trend Analysis in Construction Management Research
Zhong, Y., & El-Diraby, T. (2025)
Automation in Construction (under review)
This study builds the first AEC-agency project-authoring advisor by coupling a vectorized technical document database with retrieval-augmented generation framework and prompts optimized through an adversarial, multi-dimensional LLM-assisted evaluation method, yielding a tool helps engineers on project authoring and designing.
Integrating Retrieval-Augmented Generation with a large-language-model retriever–generator pipeline, this study curates and analyzes 1,100 construction-management publications and metadata including yearly citation from 1980-2024, clusters 6,460 fine-grained topics from publications, outperforms LDA by wide margins, and reveals the scholarly shift and research trends from extracted topics.
Experience


Data Scientist Intern – Fix Income Team
Guardian Capital
Toronto, 01/2024-05/2024
Developed ML and deep learning pipelines for predicting Non-Farm Payroll (NFP) and US Treasury Yield, focusing on enhancing predictive accuracy and efficiency.
Built an end-to-end ML pipeline, integrating Kernelized Linear Regression, Gradient Boosting, and Random Forest to meet tight deadlines with high accuracy.
Optimized model performance through hyperparameter tuning, improving predictive accuracy significantly.
Data Scientist Intern – Global Risk Analytics Team
Royal Bank of Canada
Toronto, 01/2024-05/2024
Developed geospatial data pipelines, ML models, and wildfire simulation framework to predict future fire risk and assess economic impact under future climate scenarios.


Developed an end-to-end geospatial data pipeline and trained ML models (kernelized logistic regression, XGBoost) to predict future fire risk and assess climate impact under different scenarios.
Built a wildfire season simulation framework using CLIMADA, combining machine learning, probabilistic models, and cellular automata to enhance economic impact assessments for wildfire management.




DOCTOR OF PHILOSOPHY
University of Toronto
MASTER OF SCIENCE IN ENGINEERING
University of California, Berkeley (UCB) Berkeley
Education
Teaching Assistant & Guest Lecturer
Taught and supported over 10 courses at University of Toronto and UC Berkeley, from NLP to data science and AI. Designed assignments, gave lectures, and mentored students.
Awards
UC Berkeley · $66,000 USD
Top engineering graduate fellowship (2019–2020)
University of Toronto · $10,000/year · 4 years
Prestigious PhD award for international scholars (2021–2025)
Connaught International Scholarship
Jane Lewis Fellowship
Core Skills:
Python · LLMs & RAG · Hugging Face · PyTorch · Google Cloud · End-to-end ML pipelines
Technical Skills
Contact
yunshun.zhong@mail.utoronto.ca
© 2025. All rights reserved.
Yunshun Zhong