I am currently a PhD student at the College of William and Mary advised by Dr. Antonio Mastropaolo. I am also fortunate to be supervised by Anh Totti Nguyen, Hy Truong Son, and Thiago Serra. My research focuses on multimodal AI and trustworthy AI. I am especially interested in (1) quantifying and understanding the limitations and biases of LLMs/VLMs and (2) making LLM systems more interpretable in high-stake domains such as medical and healthcare.
My works have been accepted at premier venues such as ACL, NAACL, Interspeech, etc. My most recent project, VLMs are Biased, has been featured on Hacker News and garnered attention from Meta's SuperIntelligence Lab and Google DeepMind.
I was also an intern at the Machine Learning Research team at CodaMetrix in Summer 2024 and Summer 2025, where I developed LLM agents that (1) extract medical entities from EHR notes and (2) evaluate and correct entities extracted by human experts and other LLMs.
Selected Publications
♠ denotes equal contribution
Vision-Language Models are Biased
An Vo♠, Khai-Nguyen Nguyen♠, Mohammad Reza Taesiri, Vy Tuong Dang, Anh Totti Nguyen, Daeyoung Kim
AI for Math Workshop @ ICML 2025, Submitted to NeurIPS 2025
We demonstrate that state-of-the-art LLMs are strongly biased toward well-known patterns and propose VLMBias, a VQA benchmark focusing on evaluating visual biases in VLMs.
Sentiment Reasoning for Healthcare
Khai-Nguyen Nguyen♠, Khai Le-Duc♠, Bach Phan Tat, Duy Le, Long Vo-Dang, Truong-Son Hy
ACL 2025, Industry Track (Oral)
We demonstrate that chain-of-thought distillation improves LLMs performance in sentiment analysis and enables LLMs to produce human-like explanation.
Medical Spoken Named Entity Recognition
Khai Le-Duc, David Thulke, Hung-Phong Tran, Long Vo-Dang, Khai-Nguyen Nguyen, Truong-Son Hy, Ralf Schluter
NAACL 2025, Industry Track (Oral)
We propose a multilingual dataset for the medical named entity recognition task.
Resource-Efficient & Effective Code Summarization
Saima Afrin, Joseph Call, Khai-Nguyen Nguyen, Oscar Chaparro, Antonio Mastropaolo
FORGE 2025
We show that Code LLMs finetuned on QLoRA/LoRA achieve comparable performance to their full-parameter finetuned versions on code summarization.
Real-time Speech Summarization for Medical Conversations
Khai Le-Duc♠, Khai-Nguyen Nguyen♠, Long Vo-Dang, Truong-Son Hy
Interspeech 2024 (Oral)
We improve cascaded medical speech summarization LLMs using high-quality synthetic data.
Getting away with more network pruning: From sparsity to geometry and linear regions
Jeffrey Cai♠, Khai-Nguyen Nguyen♠, Nishant Shrestha, Aidan Good, Ruisen Tu, Xin Yu, Shandian Zhe, Thiago Serra
Workshop on Sparsity in Neural Networks @ ICLR 2023, CPAIOR 2023
We propose a mathematical theorem of the geometric properties of neural networks and apply it to model pruning.