NLM IRP Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on May 14, 2024

Speaker
Stanley Liang
Time
11 a.m.
Presentation Title
Knowledge-driven Latent Diffusion For COVID-19 Pneumonia Radiology Pattern Synthesis
Location

Contact NLM_IRP_Seminar_Scheduling@mail.nih.gov with questions about this seminar.

Abstract:

Objective: Latent diffusion model (LDM) is the state-of-the-art method to synthesize medical image with designated knowledge. We propose a novel knowledge-driven strategy to establish the cross-modal binding between medical knowledge and target visual patterns related to COVID-19 pneumonia with an LDM model by a class prior preservation technique.

Method: We used the Stable Diffusion 2-1-base LDM pretrained with by large image datasets as the basic model for optimization. The LDM was respectively trained by a chest X-ray (CXR) image dataset with 2,599 frontal CXR images and a chest computed tomography (CT) image dataset with 104 CT scans of confirmed COVID-19 cases and 56 normal CT scans. When trained with the CXR images, the images in the CXR dataset were paired with the pattern identifier “bilateral lung edema mRALE 24” and the class identifier “chest x-ray”. When trained with the CT images, the images in the CT dataset were paired with the pattern identifier “COVID-19 pneumonia” and the class identifier “chest CT”. The model was optimized by an objective loss function combined with the class-specific prior preservation loss and the reconstruction loss to bind the medical concepts to the corresponding visual patterns via the CLIP text encoder and the VAE in the LDM architecture. We also synthesized images respectively using Wasserstein GAN with gradient penalty (WGAN-GP) and a pure denoising diffusion implicit model (DDIM) for quality comparison.

Results: After training, the synthetic CXR images generated by the combined text prompt “bilateral lung edema mRALE 24, chest x-ray” via the LDM have the Frechet inception distance (FID) of 9.2158 and kernel inception distance (KID) 0.0818 computed with the real positive CXR images, which indicates superior quality over other methods. The classification accuracy is 0.9975 with precision of 1.0 and recall of 0.9950 when the synthetic positive images with the real negative images were classified by a trained vision transformer (ViT). The synthetic CT images generated by the combined text prompt “COVID-19 pneumonia, chest CT” via the LDM have the Frechet Inception Distance (FID) of 7.99 and Kernel Inception Distance (KID) of 0.041 computed with the real positive CT slices, which also indicates superior quality over other methods. The synthetic CT images had the classification accuracy of 0.965, F1 of 0.963, recall of 0.930, and sensitivity of 0.930 when they were considered as COVID-19 positive and classified using a model trained with real CT images.

Conclusion: We conclude that the LDM can synthesize both high quality CXR and CT images with the designated COVID-19 pneumonia patterns using the proposed knowledge driven method. It provides a new approach for cross-modality knowledge representation with large vision models.