BMIR Colloquium: Craig E. Stanley Jr., PhD Paul L. Snyder, PhD William F. Dowling, PhD Mevan S. Samarasinghe “Towards an integrated healthcare knowledge graph: Transforming and connecting dynamic healthcare data”

May 3, 2018 @ 12:00 pm – 1:00 pm
MSOB, Conference Room X-275
1265 Welch Rd
Stanford, CA 94305
Marta Vitale
(650) 724-3979

Elsavier group 4

Craig E. Stanley Jr., PhD;
Paul L. Snyder, PhD;
William F. Dowling, PhD;
Mevan S. Samarasinghe, VP Search & Discovery

As the scale and scope of healthcare information grows, practices evolve from traditional disease-centric medicine to precision medicine. To address these challenges, we have created a knowledge graph that facilitates advanced clinical decision support approaches by connecting and extracting knowledge from across the corpus of medical literature, unifying heterogeneous sources of healthcare information. Our healthcare knowledge graph, H-Graph, is built from a comprehensive medical ontology comprising over 500,000 medical concepts and relationships between them. The core of H-Graph is realized on an RDF-based graph database platform, and it uses Linked Data principles to connect systems containing authoritative healthcare knowledge and data sets: diseases, drugs, anatomy, best practices, order sets, care plans, guidelines, clinical pathways, medical imaging data, and relevant literature, such as journals and books.


To represent increasingly complex healthcare data and provide a foundation on which to support clinical decisions, the H-Graph data model supports deeper granularity and expressive semantic relationships. It provides multi-language support (English, French, and Spanish), geographic specificity, and qualitative mappings to industry-standard vocabularies used in medical literature and electronic health records, including SNOMED, LOINC, ICD-10, RXNORM, and MeSH. In addition to ontological relationships, H-Graph supports contextualization of semantic relations based on patient demographics and medical history. H-Graph is able to represent both scalar reference ranges and qualitative observations for laboratory tests, categorized by patient characteristics.


We have also employed natural language processing and convolutional neural network methods to extend H-Graph with a comprehensive set of symptom-to-disease relations extracted from the Science Direct and Clinical Key publication platforms (including 7.7M full-text articles and book chapters published in the medical domains). Natural language extraction in concert with machine learning and automated validation pipelines allows a much higher coverage of symptoms and (uncommon or even rare) diseases than can be achieved with manually constructed knowledge bases.