I am a PhD student at the College of William and Mary advised by Antonio Mastropaolo. I am also fortunated to be mentored by Anh Totti Nguyen, Truong-Son Hy, and Thiago Serra.

I am interested in multimodal AI and trustworthy AI: (1) evaluating and understanding LLMs/MLLMs and (2) making AI systems more robust and interpretable in high-stakes domains such as healthcare.

I was a Machine Learning Research Intern at CodaMetrix (Summer 2024 & 2025), where I developed LLM agents to (1) extract medical entities from EHRs and (2) evaluate and correct entities extracted by human experts and LLMs.


Recent Highlights 🔥
  • VLMsAreBiased used by ByteDance to evaluate Seed-1.8's VQA capabilities! See their announcement and model card
  • VLMsAreBiased used by Google DeepMind to evaluate Gemini-3-Pro's visual reasoning capabilities! See their announcement here: Gemini 3 Pro: the frontier of vision AI
  • Two oral presentations at ACL 2025 (Industry Track) and Interspeech 2024.

Selected Publications

♠ denotes equal contribution

Sentiment Reasoning
ACL 2025 (Industry Track) Oral
Sentiment Reasoning for Healthcare
Khai-Nguyen Nguyen, Khai Le-Duc, Bach Phan Tat, Duy Le, Long Vo-Dang, Truong-Son Hy
Summary

We show that training LLMs on chain-of-thought reasoning improves their performance in sentiment analysis while enabling human-like explanations.

Real-time Speech Summarization
Interspeech 2024 Oral
Real-time Speech Summarization for Medical Conversations
Khai Le-Duc, Khai-Nguyen Nguyen, Long Vo-Dang, Truong-Son Hy
Summary

We show that an even split in the annotation budget between synthetic and human-curated data collection yields the best dataset for medical speech summarization LLMs.

Network Pruning
CPAIOR 2023 ICLR Workshop
Getting Away with More Network Pruning
Jeffrey Cai, Khai-Nguyen Nguyen, Nishant Shrestha, Aidan Good, Ruisen Tu, Xin Yu, Shandian Zhe, Thiago Serra
Summary

We propose a mathematical theorem on the upper bound of the expressiveness of a neural network based on their geometric properties and apply it to model pruning.


Selected Preprints

FeatureSHAP
Under Review
Toward Explaining Large Language Models in Software Engineering Tasks
Antonio Vitale, Khai-Nguyen Nguyen, Denys Poshyvanyk, Rocco Oliveto, Simone Scalabrino, Antonio Mastropaolo
Summary

We present FeatureSHAP, an interpretability framework for software engineering tasks that attributes Shapley scores to input features based on their contributions to model output.

S-Chain: Structured Visual Chain-of-Thought for Medicine
Under Review
S-Chain: Structured Visual Chain-of-Thought for Medicine
Khai Le-Duc, Phuong T.H. Trinh, Duy Minh Ho Nguyen, Tien-Phat Nguyen, Nghiem Tuong Diep, An Ngo, Tung Vu, Trinh Vuong, Anh-Tien Nguyen, Nguyen Dinh Mau, Van Trung Hoang, Khai-Nguyen Nguyen, Hy Nguyen, Chris Ngo, Anji Liu, Nhat Ho, Anne-Christin Hauschild, Khanh Xuan Nguyen, Thanh Nguyen-Tang, Pengtao Xie, Daniel Sonntag, James Zou, Mathias Niepert, Anh Totti Nguyen
Summary

We present S-Chain, a medical dataset for strutured visual reasoning. Training medical models on S-Chain improves their accuracy and explainability.