NLM DIR Seminar Schedule
UPCOMING SEMINARS
-
May 12, 2026 John Bridgers
A bi-partition function algorithm to evaluate inferred subclonal structures in single-cell sequencing data -
May 14, 2026 Brandon Colelough
TBD -
May 19, 2026 Leann Lindsey
Are Genomic Language Models Learning? Insights from Tokenization Analysis and Prophage Detection in Bacterial Genomes -
May 26, 2026 Harutyun Saakyan
TBD -
May 27, 2026 Brian Abraham
Cis-Regulatory Organization and Transcription Factor Control of Cell Identity and Disease
RECENT SEMINARS
-
May 5, 2026 Benjamin Hou
Machine Learning for Craniofacial Malocclusion Prediction -
April 28, 2026 Niccolo Marini
From Unimodal Datasets to Multimodal Foundation Models: Synthetic Clinical Notes for Dermatology AI -
April 21, 2026 Yoshitaka Inoue
Drug Response Prediction: Generalization using Graph Neural Networks & Reasoning over Predictions using LLMs -
April 16, 2026 Matthew Diller
Analyzing Similarity in Common Data Elements in the NIH CDE Repository via Semantic Clustering -
April 7, 2026 Henry Secaira Morocho
Toward a systematic method of database enrichment for reference-based metagenomics
Scheduled Seminars on April 28, 2026
In-person: Building 38A/B2N14 NCBI Library or Meeting Link
Contact NLMDIRSeminarScheduling@mail.nih.gov with questions about this seminar.
Abstract:
Foundation models and Large Language Models (LLMs) have recently reached significant advancements in biomedical AI, achieving strong performance on clinical tasks such as image classification, report generation, and decision support. In dermatology, multimodal (MM) systems that jointly process images and clinical text are particularly promising, as they enable richer representations and support flexible applications like zero-shot classification and cross-modal retrieval.
However, building such systems requires large paired image-text datasets, which are scarce in dermatology. Most publicly available datasets are unimodal, pairing images with structured labels or sparse metadata rather than descriptive clinical notes. The few existing large-scale image-text collections are mostly scraped from the internet and contain noisy, unreliable content. LLMs offer a potential solution, but their tendency to hallucinate clinically inaccurate information makes naive text synthesis unreliable for MM training.
This line of research aims to alleviate this gap introducing a framework that converts unimodal dermatology datasets into multimodal image-text pairs, without requiring manual annotation or pairing. The framework synthesizes clinical notes to pair with real dermatology images, combining existing unimodal datasets, structured metadata, and LLMs, accelerating the development of multimodal dermatology models in a scalable way.
The framework involves to stages: 1) the synthetic notes generation, focusing on methods to reduce hallucinations, leading to a controlled set of image-text pairs; 2) their application to develop foundation models, focusing on two possible applications (i.e. diffusion models for image synthesis and teacher-student architecture to exploit unpaired samples).
Models trained within the framework consistently outperform state-of-the-art medical foundation models on cross-modal retrieval and zero-shot classification, when evaluated across fifteen dermatology datasets, including nine external benchmarks and over 37,000 samples. These results demonstrate that carefully controlled synthetic text is an effective bridge across modalities, offering a practical path toward robust dermatology foundation models even in the absence of large real paired datasets.