NLM IRP Seminar Schedule
UPCOMING SEMINARS
-
April 30, 2024 Wenya Rowe
The conformal central charge of the spin-1/2 XX model derived from long-chain asymptotics -
May 2, 2024 OPEN
TBD -
May 7, 2024 OPEN
TBD -
May 9, 2024 Pascal Mutz
TBD -
May 14, 2024 Stanley Liang
TBD
RECENT SEMINARS
-
April 25, 2024 Ermin Hodzic
Condition-Aware Cell Type Deconvolution of Bulk Tissues -
April 23, 2024 OPEN
TBD -
April 16, 2024 Jaya Srivastava
Regulatory plasticity of the human genome -
April 11, 2024 Sergey Shmakov
Comprehensive survey of the TnpB RNA-guided nucleases -
April 2, 2024 Yifan Yang
Fairness and Bias in Biomedical AI
Scheduled Seminars on Feb. 28, 2023
Contact NLM_IRP_Seminar_Scheduling@mail.nih.gov with questions about this seminar.
Abstract:
Biomedical relation extraction (RE) is the task of automatically identifying and characterizing relations between biomedical concepts from free text. RE is a central task in biomedical natural language pro-cessing (NLP) research and plays a critical role in many downstream applications, such as literature-based discovery and knowledge graph construction. State-of-the-art methods were used primarily to train machine learning models on individual RE datasets, such as protein-protein interaction and chemical-induced disease relation. Manual dataset annotation, however, is highly expensive and time-consuming, as it requires domain knowledge. Existing RE datasets are usually domain-specific or small, which limits the development of generalized and high-performing RE models. In this work, we present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset. Based on the framework and dataset, we report on BioREx, a data-centric based approach for extracting relations. Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset, improving the F1-score from 74.4% to 79.6%. We further demonstrate that the combined dataset can improve performance for five different RE tasks. In addition, we compare BioREx with transfer learning and multi-task learning ap-proaches, and the results show that it outperforms them in BioRED and for most tasks. Further, we used BioREx’s pre-trained model and demonstrated its portability in two RE tasks: drug-drug N-ary combina-tion and document-level gene-disease RE. The results show improvements in both tasks.