NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
Jan. 22, 2026 Mario Flores
AI Pipeline for Characterization of the Tumor Microenvironment -
Jan. 27, 2026 Zhaohui Liang
Heterogeneous Graph Re-ranking for CLIP-based Medical Cross-modal Retrieval -
Jan. 29, 2026 Mehdi Bagheri Hamaneh
FastSpel: A simple peptide spectrum predictor that achieves deep learning-level performance at a fraction of the computational cost -
Feb. 3, 2026 Matthew Diller
TBD -
Feb. 5, 2026 Lana Yeganova
From Algorithms to Insights: Bridging AI and Topic Discovery for Large-Scale Biomedical Literature Analysis.
RECENT SEMINARS
-
Jan. 20, 2026 Anastasia Gulyaeva
Diversity and evolution of the ribovirus class Stelpaviricetes -
Jan. 8, 2026 Won Gyu Kim
LitSense 2.0: AI-powered biomedical information retrieval with sentence and passage level knowledge discovery -
Dec. 16, 2025 Sarvesh Soni
ArchEHR-QA: A Dataset and Shared Task for Grounded Question Answering from Electronic Health Records -
Dec. 2, 2025 Qingqing Zhu
CT-Bench & CARE-CT: Building Reliable Multimodal AI for Lesion Analysis in Computed Tomography -
Nov. 25, 2025 Jing Wang
MIMIC-EXT-TE: Millions Clinical Temporal Event Time-Series Dataset
Scheduled Seminars on Jan. 13, 2026
In-person: Building 38A/B2N14 NCBI Library or Meeting Link
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Medical calculators are evidence-based tools that can provide quantitative decision support in disease diagnosis, risk stratification, treatment planning, and prognosis prediction. While large language models (LLMs) have shown state-of-the-art performance in medical question answering and clinical trial matching, their capabilities for medical calculation remain unclear. To address this, we first present benchmarks for evaluating the capabilities of LLMs for medical calculator recommendation (MedQA-Calc, AMIA 2025) and usage (MedCalc-Bench, NeurIPS 2024). We found that off-the-shelf LLMs do not possess sufficient knowledge of medical calculators and are prone to various calculation errors. To bridge this gap, we introduced AgentMD (Nat Commun 2025), an AI agent capable of curating and applying medical calculators. Using the medical literature, AgentMD curates over 2,000 medical calculators, achieving over 85% accuracy on expert quality checks and over 90% pass rates on unit testing. Augmented by these self-curated calculators and a code executor, AgentMD substantially surpasses off-the-shelf GPT-4 on risk prediction. Results on 698 real-world emergency department notes confirm that AgentMD accurately computes medical risks at the individual level. Moreover, AgentMD can provide population-level insights for institutional risk management. In summary, our trilogy work presents first-of-its-kind benchmarks and methodologies in language modeling for medical calculations, laying the research foundations in this field.