Πέμπτη 11 Φεβρουαρίου 2016

ReCAP: Feasibility and Accuracy of Extracting Cancer Stage Information From Narrative Electronic Health Record Data [CARE DELIVERY]

F1.small.gif

QUESTION ASKED:

Cancer stage, one of the most important prognostic factors for cancer-specific survival, is often documented in narrative form in electronic health records (EHRs), and as such is difficult to abstract by registrars and other secondary data users, including clinicians participating in quality reporting activities. Can the cancer stage be accurately extracted by natural language processing (NLP) of the text from EHRs?

SUMMARY ANSWER:

In a combined dataset of N = 2,323 patients with lung cancer (training set: n = 1,103; validation set n = 1,220), we analyzed 751,880 documents and discovered at least one stage statement for 98.6% of patients (median of 24 documents with stage statements per patient). Despite a high degree of discordance in patient records (83.6% of patients had conflicting stage statements in their HER; Fig 2), algorithmically derived stage accuracy was very high in the validation set, = 0.906 (95% CI, 0.873 to 0.939), as compared with the gold standard of tumor registrar–derived stage.

METHODS:

We developed an NLP algorithm to extract stage statements from machine-readable EHR documents, including automated rules to choose the most likely stage when discordance was present in the EHR; the algorithm was developed on a training set of patients with lung cancer and independently validated on a test set of patients with lung cancer who were seen at our institution.

BIAS, CONFOUNDING FACTOR(S), DRAWBACKS:

An exact stage (eg, stage I, stage IV) could be calculated for only 72% of the patients; the remainder were assigned an inexact stage (eg, "early stage"). In an exploratory analysis, we were able to distinguish stage IIIA from stage IIIB, but the accuracy was not as good, in the 64% to 79% range. The experiments were carried out only on patients with lung cancer, so it is unknown whether other tumor types would have a similar level of performance. We did not explicitly consider the provenance of the information, (eg, was the stage documented by a medical student, an attending oncologist, etc). Finally, given that this was performed at a single tertiary care institution, there may be significant differences in documentation patterns at other institutions that could affect the reproducibility of the results.

REAL-LIFE IMPLICATIONS:

This new approach to the determination of summary stage in patients with lung cancer can be applied rapidly and broadly to a patient population with large amounts of EHR data. Despite the presence of significant discordance in documentation, the results were highly accurate. This proof-of-concept suggests that NLP may augment and enhance manual abstraction efforts and may even replace them for certain targeted applications.

FIG 2.

Network diagram of stage co-occurrences found in individual patient records. Circles represent network nodes, which are proportionate to the number of times that a particular stage category is mentioned across all patients. Lines between nodes represent network edges, with width proportionate to the number of times that a particular co-occurrence is observed across all patients.



from Cancer via ola Kala on Inoreader http://ift.tt/1PpWnNg
via IFTTT

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου