NLM IRP Seminar Schedule

UPCOMING SEMINARS

RECENT SEMINARS

Scheduled Seminars on Feb. 28, 2023

Speaker
Po-Ting Lai
Time
11 a.m.
Presentation Title
Data-centric Artificial Intelligence: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets
Location
Virtual - see link below

Contact NLM_IRP_Seminar_Scheduling@mail.nih.gov with questions about this seminar.

Abstract:

Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language pro-cessing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric based approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, improving the F1-score from 74.4% to 79.6%. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we compare BioREx with transfer learning and multi-task learning ap-proaches, and the results show that it outperforms them in BioRED and for most tasks. Further, we used BioREx’s pre-trained model and demonstrated its portability in two RE tasks: drug-drug N-ary combina-tion and document-level gene-disease RE. The results show improvements in both tasks.