Yunshun Zhong

Machine Learning Engineer & Data Scientist

From research to deployment — building ML systems that work in the real world.
With hands-on experience from internships and published research, I'm passionate about solving real-world problems using data, code, and curiosity.

Research & Publications

Domain-specific language models pre-trained on construction management systems corpora

Automation in Construction, 160, 105316.

Builds the first CMS domain corpus and pre-trains BERT/RoBERTa variants, then fine-tunes them on text classification and NER tasks. Achieves 5.9 % and 8.5 % F1-score gains over general PLMs, proving the value of domain adaptation for construction-focused NLP.

Zhong, Y., & Goodfellow, S. D. (2024)

An Agency-specific Project Authoring Advisor: An LLM-based RAG System.

Zhong, Y., & El-Diraby, T. (2025)

Advanced Engineering Informatics (under review)

Leveraging Large Language Models with Retrieval-Augmented Generation for Trend Analysis in Construction Management Research

Zhong, Y., & El-Diraby, T. (2025)

Automation in Construction (under review)

This study builds the first AEC-agency project-authoring advisor by coupling a vectorized technical document database with retrieval-augmented generation framework and prompts optimized through an adversarial, multi-dimensional LLM-assisted evaluation method, yielding a tool helps engineers on project authoring and designing.

Integrating Retrieval-Augmented Generation with a large-language-model retriever–generator pipeline, this study curates and analyzes 1,100 construction-management publications and metadata including yearly citation from 1980-2024, clusters 6,460 fine-grained topics from publications, outperforms LDA by wide margins, and reveals the scholarly shift and research trends from extracted topics.

Experience

Data Scientist Intern – Fix Income Team

Guardian Capital

Toronto, 01/2024-05/2024

Developed ML and deep learning pipelines for predicting Non-Farm Payroll (NFP) and US Treasury Yield, focusing on enhancing predictive accuracy and efficiency.

  • Built an end-to-end ML pipeline, integrating Kernelized Linear Regression, Gradient Boosting, and Random Forest to meet tight deadlines with high accuracy.

  • Optimized model performance through hyperparameter tuning, improving predictive accuracy significantly.

Data Scientist Intern – Global Risk Analytics Team

Royal Bank of Canada

Toronto, 01/2024-05/2024

Developed geospatial data pipelines, ML models, and wildfire simulation framework to predict future fire risk and assess economic impact under future climate scenarios.

  • Developed an end-to-end geospatial data pipeline and trained ML models (kernelized logistic regression, XGBoost) to predict future fire risk and assess climate impact under different scenarios.

  • Built a wildfire season simulation framework using CLIMADA, combining machine learning, probabilistic models, and cellular automata to enhance economic impact assessments for wildfire management.

DOCTOR OF PHILOSOPHY

University of Toronto

MASTER OF SCIENCE IN ENGINEERING

University of California, Berkeley (UCB) Berkeley

Education

Teaching Assistant & Guest Lecturer

Taught and supported over 10 courses at University of Toronto and UC Berkeley, from NLP to data science and AI. Designed assignments, gave lectures, and mentored students.

Awards

UC Berkeley · $66,000 USD
Top engineering graduate fellowship (2019–2020)

University of Toronto · $10,000/year · 4 years
Prestigious PhD award for international scholars (2021–2025)

Connaught International Scholarship

Jane Lewis Fellowship

Core Skills:

Python · LLMs & RAG · Hugging Face · PyTorch · Google Cloud · End-to-end ML pipelines

Technical Skills