1265 Welch Road
Stanford
CA 94305
Imon Banerjee, PhD.,
Instructor,
Radiology and Biomedical Data Science,
Stanford University
Abstract:
The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. However, formulation of a fully supervised NLP task is restricted by moderately low agreements between the human raters in term of collection of ground-truth labels. The main discrepancies occur when the sentences contain contradictory information or unclear statements. In this talk, I will present a weakly supervised NLP approach which annotates electronic medical record clinical notes by leveraging domain-specific vocabulary and distributional semantic, without requiring manual chart review. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms. I will also highlight challenges associated with generalization of such weakly supervised approach on an external dataset.