NLM IRP Seminar Schedule



Scheduled Seminars on Dec. 19, 2023

Matthew Diller, PhD candidate
11 a.m.
Presentation Title
Ontology-based Semantic Data Integration for Exposome Research
Virtual, see link below

Contact with questions about this seminar.


The interplay between a human’s external environment and their internal physiological processes is believed to have a tremendous impact on health and treatment outcomes. Unfortunately, the range of known and potential external environmental health factors is incredibly broad and growing at a fast rate, making it difficult to study the relationships between them and health outcomes. One challenge caused by the sheer volume and heterogeneity of environmental and clinical data that are needed to analyze these factors is integrating those data to better support epidemiology research. For example, two data sets may use two different names for variables that refer to the same thing (e.g., ‘M’ or ‘Male’ as values for a gender field) or they may contain variables that have the same name but refer to completely different things (e.g., the variable ‘peak flow rate’ may have different meanings in the context of medicine than it would in the context of civil engineering). Fortunately, Semantic Web technologies, such as ontologies, are well-suited to support the integration of such data, as they can provide precise, unambiguous, and structured semantics for the entities represented in the data that are both human-and machine-readable. However, OWL 2 ontologies have difficulty with representing temporal information due to limitations with OWL 2—a potential problem for a use case such as the external exposome where the timing, order, and frequency of exposures matter. In addition, our preliminary analysis of existing biomedical ontologies suggests that the ontology classes required for the representation of environmental and other non-clinical data are insufficient. The goal of this project, therefore, is to develop an ontology that represents external exposure factors and measures of them, the health outcomes they are related to, and the temporal relations that exist between these entities. I intend to then use these ontologies with a graph database to develop a structured representation in which the relationships between patients, their health outcomes, and the relevant environmental factors are made maximally explicit. The resulting database can then be queried and used by epidemiologists for the purpose of better understanding these relationships between the external environment and patient health outcomes.