NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
Jan. 22, 2026 Mario Flores
AI Pipeline for Characterization of the Tumor Microenvironment -
Jan. 27, 2026 Zhaohui Liang
Heterogeneous Graph Re-ranking for CLIP-based Medical Cross-modal Retrieval -
Jan. 29, 2026 Mehdi Bagheri Hamaneh
FastSpel: A simple peptide spectrum predictor that achieves deep learning-level performance at a fraction of the computational cost -
Feb. 3, 2026 Matthew Diller
TBD -
Feb. 5, 2026 Lana Yeganova
From Algorithms to Insights: Bridging AI and Topic Discovery for Large-Scale Biomedical Literature Analysis.
RECENT SEMINARS
-
Jan. 22, 2026 Mario Flores
AI Pipeline for Characterization of the Tumor Microenvironment -
Jan. 20, 2026 Anastasia Gulyaeva
Diversity and evolution of the ribovirus class Stelpaviricetes -
Jan. 8, 2026 Won Gyu Kim
LitSense 2.0: AI-powered biomedical information retrieval with sentence and passage level knowledge discovery -
Dec. 16, 2025 Sarvesh Soni
ArchEHR-QA: A Dataset and Shared Task for Grounded Question Answering from Electronic Health Records -
Dec. 2, 2025 Qingqing Zhu
CT-Bench & CARE-CT: Building Reliable Multimodal AI for Lesion Analysis in Computed Tomography
Scheduled Seminars on March 25, 2025
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Large language models (LLMs) have been integrated into numerous biomedical application frameworks. Despite their significant potential, they possess vulnerabilities that can lead to serious consequences. In this seminar, we will discuss two types of vulnerabilities in LLMs and explore potential solutions to address them: adversarial manipulations and data memorization.
Adversarial manipulations can cause LLMs to generate harmful medical suggestions or promote specific stakeholder interests. We will demonstrate two methods by which a malicious actor can achieve this: prompt injection and data poisoning. Thirteen models were tested, and all exhibited significant behavioral changes after manipulation in three tasks. Although newer models performed slightly better, they were still greatly affected.
Data memorization is another concern, particularly when LLMs are fine-tuned with medical corpora or patient records for specific tasks. This can lead to the unintended memorization of training data, resulting in the exposure of sensitive patient information and breaches of confidentiality. Controlled text generation can be employed to mitigate such memorization, effectively reducing the risk of exposing patient information during inference and enhancing privacy protection.