Michael Wornow

Github

Scholar

I am a PhD candidate in Computer Science at Stanford University.

I am advised by Nigam Shah and Chris Ré. My research focus is on developing and operationalizing AI models in healthcare.

I graduated summa cum laude from Harvard in 2020 with a double major in Computer Science and Statistics, and am fortunate to be supported by an NSF Graduate Research Fellowship and Stanford HAI Graduate Fellowship.

Research

Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching

Qizheng Zhang, Michael Wornow, Kunle Olukotun

ICML: ES-FoMo III Workshop (2025)
MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

Suhana Bedi*, Hejie Cui*, Miguel Fuentes*, Alyssa Unell*, Michael Wornow et al. (+ many authors)

Under review (2025)

Website | Blog | Tweet | Github
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs

Michael Wornow*, Suhana Bedi*, Miguel Fuentes, Ethan Steinberg, Jason Fries, Christopher Ré, Sanmi Koyejo, Nigam H. Shah

ICLR (2025)

Talk | Github | Huggingface
Top of the CLASS: Benchmarking LLM Agents on Real-World Enterprise Tasks

Michael Wornow, Vaishnav Garodia, Vasilis Vassalos, Utkarsh Contractor

ICLR: Workshop on Trustworthy LLMs (2025)

Github | Huggingface
WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

Michael Wornow, Avanika Narayan, Ben Viggiano, Ishan S. Khare, Tathagat Verma, Tibor Thompson, Miguel Angel Fuentes Hernandez, Sudharsan Sundar, Chloe Trujillo, Krrish Chawla, Rongfei Lu, Justin Shen, Divya Nagaraj, Joshua Martinez, Vardhan Agrawal, Althea Hudson, Nigam H. Shah, Christopher Ré

NeurIPS: Benchmarks (2024)

Talk | Website | Tweet | Github | Dataset
Automating the Enterprise with Foundation Models

Michael Wornow*, Avanika Narayan*, Krista Opsahl-Ong, Quinn McIntyre, Nigam H. Shah, Christopher Ré

VLDB (2024)

Talk | Blog | Tweet | Github
Zero-Shot Clinical Trial Patient Matching with LLMs

Michael Wornow*, Alejandro Lozano*, Dev Dash, Jenelle Jindal, Kenneth W. Mahaffey, Nigam H. Shah

NEJM AI (2024)

Talk | Website | Github
EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

Michael Wornow*, Rahul Thapa*, Ethan Steinberg, Jason Fries, Nigam H. Shah

NeurIPS: Benchmarks (2023) — Spotlight

Talk | Website | Github | Dataset | Huggingface
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

Eric Nguyen*, Michael Poli*, Marjan Faizi*, Armin W. Thomas, Callum Birch Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Christopher Ré

NeurIPS (2023) — Spotlight

Talk | Tweet | Blog | Github | Huggingface
The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs

Michael Wornow, Yizhe Xu, Rahul Thapa, Birju Patel, Ethan Steinberg, Scott Fleming, Michael A. Pfeffer, Jason Fries, Nigam H. Shah

NPJ Digital Medicine (2023)

Talk | Tweet | Huggingface
APLUS: A Python Library for Usefulness Simulations of Machine Learning Models in Healthcare

Michael Wornow, Elsie Ross, Alison Callahan*, Nigam H. Shah*

Journal of Biomedical Informatics (2023)

Github
Generating experimentally unrelated target molecule-binding highly functionalized nucleic-acid polymers using machine learning

Jonathan C. Chen, Jonathan P. Chen, Max W. Shen, Michael Wornow, Minwoo Bae, Wei-Hsi Yeh, Alvin Hsu, David R. Liu

Nature Communications (2022)

Tweet
Cut out the annotator, keep the cutout: better segmentation with weak supervision

Sarah Hooper, Michael Wornow, Ying Hang Seah, Peter Kellman, Hui Xue, Frederic Sala, Curtis Langlotz, Christopher Ré

ICLR (2021)

Talk
Medical Event Data Standard (MEDS): Facilitating Machine Learning for Health

Bert Arnrich, Edward Choi, Jason Alan Fries, Matthew B.A. McDermott, Jungwoo Oh, Tom Pollard, Nigam H. Shah, Ethan Steinberg, Michael Wornow, Robin van de Water

ICLR: TS4H Workshop (2024)

Github
MEDS Decentralized, Extensible Validation (MEDS-DEV) Benchmark: Establishing Reproducibility and Comparability in ML for Health

Kolo et al. (+ many authors)

ML4H: Demo Track (2024)

Github
A Systematic Review of Testing and Evaluation of Health Care Applications of Large Language Models (LLMs)

Suhana Bedi, Yutong Liu, Lucy Orr-Ewing, Dev Dash, Sanmi Koyejo, Alison Callahan, Jason A. Fries, Michael Wornow, Akshay Swaminathan, Lisa Soleymani Lehmann, Hyo Jung Hong, Mehr Kashyap, Akash Chaurasia, Nirav Shah, Karandeep Singh, Troy Tazbaz, Arnold Milstein, Michael A. Pfeffer, Nigam H. Shah

JAMA (2024)
Standing on FURM ground - A framework for evaluating Fair, Useful, and Reliable AI Models in healthcare systems

Alison Callahan, Duncan McElfresh, Juan M. Banda, Gabrielle Bunney, Danton Char, Jonathan Chen, Conor Corbin, Debadutta Dash, Norman Downing, Sneha Jain, Nikesh Kotecha, Jonathan Masterson, Michelle M. Mello, Keith Morse, Srikar Nallan, Abby Pandya, Anurang Revri, Aditya Sharma, Christopher Sharp, Rahul Thapa, Michael Wornow, Alaa Youssef, Michael A. Pfeffer, Nigam H. Shah

NEJM Catalyst (2024)
Construction of disease-specific cytokine profiles by associating disease genes with immune responses

Tianyun Liu, Shiyin Wang, Michael Wornow, Russ B. Altman

PLOS Computational Biology (2022)
Inter-region transfers for pandemic surges

Kenneth A Michelson, Chris A Rees, Jayshree Sarathy, Paige VonAchen, Michael Wornow, Michael C Monuteaux, Mark I Neuman

Clinical Infectious Diseases (2020)
In vivo base editing restores sensory transduction and transiently improves auditory function in a mouse model of recessive deafness

Wei-Hsi Yeh, Olga Shubina-Oleinik, Jonathan M. Levy, Bifeng Pan, Gregory A. Newby, Michael Wornow, Rachel Burt, Jonathan C. Chen, Jeffrey R. Holt, David R. Liu

Science Translational Medicine (2020)

Experience

	Microsoft Research Machine Learning Research Intern Summer 2023
	Insitro Machine Learning Intern Summer 2021
	Broad Institute of MIT & Harvard Research Assistant, David Liu Lab Fall 2018 - Spring 2020
	Bain & Company Associate Consultant Intern Summer 2019
	Goldman Sachs Global Investment Research Summer Analyst Summer 2018
	Facebook Software Engineering Intern Summer 2017
	Joint Genome Institute Bioinformatics Intern Summer 2017
	Joint BioEnergy Institute Bioinformatics Intern Summer 2015

Education

	Stanford University \| 2020 - 2025 PhD, Computer Science NSF Graduate Research Fellowship HAI Graduate Fellowship
	Harvard College \| 2016 - 2020 AB, Double Major in Computer Science & Statistics Phi Beta Kappa Summa Cum Laude Undergraduate Statistics Prize John Harvard Scholar Detur Book Prize Derek Bok Certificate for Excellence in Teaching (3x)

Teaching

	Electronic Health Records and Clinical AI (BIOS 417) Instructor Fall 2024
	Machine Learning (CS 229) Course Assistant Summer 2024
	Artificial Intelligence: Principles and Techniques (CS 221) Course Assistant Fall 2023
	Graduate Cybersecurity (CS 263) Teaching Fellow Fall 2019
	Machine Learning (CS 181) Teaching Fellow Spring 2019, Spring 2020
	Introduction to Computer Science (CS 50) Teaching Fellow Fall 2017

Talks

	Thesis Defense March 2025 Links: Video
	Long Context Models for EHR Data March 2025 Links: Video
	Clinical Trial Patient Matching with LLMs (Short) January 2025 Links: Video
	Multimodal Foundation Models for Business Process Management Tasks November 2024 Links: Video
	Foundation Models for Structured EHR Data November 2024 Links: Video
	Clinical Trial Patient Matching with LLMs November 2024 Links: Video
	Automating the Enterprise With Foundation Models May 2024 Links: Video
	Large Language Models (LLMs) for Healthcare May 2024 Links: Video
	Shaky Foundations of Clinical LLMs March 2024 Links: Video
	Foundation Models for EHRs November 2023 Links: Video
	EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models November 2023 Links: Video

Projects

Automating Enterprise Healthcare Workflows and Clinical Decision Making with Foundation Models Computer Science

June 2025

PhD thesis. In this dissertation, we argue that foundation models can solve many of the challenges faced in delivering healthcare, from improving clinical decision making to reducing administrative overhead. Towards this aim, we release novel benchmarks, datasets, and models anchored in real-world clinical data and end-to-end workflows.

Links: Thesis
Applying Deep Learning to Discover Highly Functionalized Nucleic Acid Polymers that Bind to Small Molecules Computational Biology

Spring 2020

Undergraduate senior thesis. A conditional variational autoencoder (CVAE), after training on one run of SELEX, learned to embed aptamers in a latent space from which novel aptamers with strong binding affinities for a target molecule could be generated through conditional sampling.

Links: Thesis
myScheduleShare Website

Fall 2013 - Summer 2015

A school-centric, collaborative scheduling system for students and faculty. Allows users to share schedules with friends, track classes, plan meetings, stay updated on homework, and more. More flexible scheduling features than Google Calendar and Microsoft Outlook.

Links: Website | PDF Overview
Improved Harvard QGuide Website

Winter 2018

Improved version of Harvard's course evaluation system, the "QGuide." A searchable, sortable interface containing all comments, evaluations, and courses offered at Harvard University from Fall 2006 - Spring 2019.

Links: Website
College Essay Management Dashboard Website

Summer 2019

Built a free, mobile-friendly dashboard in React for college counselors to better manage their students' essays. Includes support for Google Docs/Microsoft Word/PDFs, automated payment processing, and a sophisticated permissioning system.

Links: Website