At first, Craig Umscheid, MD, FACP, and colleagues thought they had found a way to improve sepsis care. The researchers had used electronic health record (EHR) data and machine learning—that is, the process of having computers teach themselves things—to predict which inpatients were going to develop severe sepsis or septic shock.
“This sepsis algorithm that we created . . . ended up being very good at predicting which patients would ultimately be labeled as having severe sepsis or septic shock by coding criteria,” said Dr. Umscheid, a hospitalist, associate professor of medicine and epidemiology, director of the Center for Evidence-based Practice, and vice chair of quality and safety in the department of medicine at the University of Pennsylvania in Philadelphia.
The natural next step was to put the tool into practice. So the Penn team set it up to alert a physician, nurse, and rapid response coordinator when a patient was identified as being at risk. Then the team watched for improvements in patient outcomes.
To their surprise, there were none. Comparing the six months before and after the tool was deployed, the researchers found no change in development of severe sepsis or septic shock, ICU stays, or mortality. “It's one thing to predict severe sepsis and it's another thing to be able to predict sepsis that otherwise would go unrecognized where there's an opportunity for improving care,” Dr. Umscheid observed about the negative findings.
As health care increasingly looks to computerized data and tools like machine learning to improve outcomes, the example of this sepsis alert illustrates the inherent challenges of the approach, he and other experts said.
“Hospitals are recognizing the utility of these data and are hiring more data scientists. . . . But we also need to proceed with caution. There are technical challenges that we still need to solve in order to make sure these models are safely deployed in a way that maximizes their utility,” said Jenna Wiens, PhD, assistant professor of computer science and engineering at the University of Michigan in Ann Arbor.
Promise and pitfalls
With hospitals gathering ever more data on their patients, it makes sense to turn to the latest technology to synthesize it. Computers can compare large numbers of patients' outcomes with their vital signs, test results, or any other information in an EHR and then develop systems for predicting outcomes in advance.
“There's a lot of information contained within an EHR. Recognizing patterns and changes within that data, you can lose the trend just from trying to incorporate the massive amount of data points. If we offload some of that into the system itself, then providers can focus on activating the right interventions at the right time and have computers do the things that they do well,” said Rebecca Jaffe, MD, ACP Member, a clinical assistant professor and hospitalist at Thomas Jefferson University in Philadelphia.
Dr. Jaffe is also working on a system to identify patients at risk for sepsis or clinical deterioration at her hospital. Inpatient conditions, particularly sepsis, have been a big focus of the teams using machine learning for health care improvement, noted Douglas S. McNair, MD, PhD, president of Cerner Math and senior vice president at Cerner Corporation in Kansas City, Mo.
“Most patients have multiple comorbid conditions or diagnoses. That sort of complexity is in general greater than traditional support methods and tools have been able to cope with. Machine learning methods are able to accommodate . . . variables being missing or not yet having values and are able nonetheless to produce accurate models or identify reproducible patterns,” he said. Cerner has developed a sepsis surveillance agent that reduced mortality at multiple community hospitals, Dr. McNair noted.
Developing accurate prediction models, with high sensitivity and specificity, has been a focus of machine learning efforts from the start, but researchers are beginning to realize that's not enough, said Dr. Wiens, who recently co-authored a review on machine learning in health care epidemiology, published online by Clinical Infectious Diseases in August 2017.
Another issue that should be important is timing, she explained. “A really accurate model that only provides a few minutes' warning might not provide enough time to intervene and change the outcome of a patient.”
Alerts can also err in the other direction, noted Dr. Jaffe, describing a predictor for cardiac arrest that turned out to be very accurate but of uncertain utility because it came so far in advance. “You could walk in to see a patient with a really high risk score, but if he or she is sitting up eating dinner and looking well, responding providers will just ask ‘What are we supposed to do for this person? There's no obvious problem right now for us to fix.’”
This issue could have contributed to the limited effectiveness of the Penn severe sepsis alert, according to Dr. Umscheid. “When we made the prediction, because it was on average a day and a half before the patient experienced the condition, providers didn't know what to do with the prediction. They weren't going to give them antibiotics because they weren't infected at the time. They weren't going to give them fluids because they weren't hypotensive,” he said.
In addition to when, there's the question of whom to alert. Dr. Wiens and her coauthor, Erica S. Shenoy, MD, PhD, are working on a machine learning tool to predict Clostridium difficile in inpatients.
“Who would be the right people to review and act on this high risk score for patients? It may or may not be the frontline providers—that depends on the planned response. For example, if one of the interventions is early isolation, you may want to have infection control staff monitor and act on a daily list of high-risk patients,” said Dr. Shenoy, who is associate chief of the infection control unit at Massachusetts General Hospital and an assistant professor at Harvard Medical School, both in Boston. “If the intervention focuses on a clinical intervention, such as early diagnosis and treatment or limiting use of unnecessary antibiotics as an antimicrobial stewardship strategy, then the clinicians caring for high-risk patients and the hospital's antimicrobial stewardship team would be the appropriate individuals to act.”
A key to the success of these alert systems will be providing previously unknown information to clinicians who have a clear action to take in response, the experts agreed.
Accomplishing the first part means teaching alerts not to state the obvious. For example, instead of identifying patients who are going to develop severe sepsis or septic shock, an alert should only identify patients who are developing severe sepsis or septic shock unrecognized by the clinical team, suggested Dr. Umscheid.
“You could imagine using the data available in the EHR in real time and before sending an alert, looking to see if the patient is on a broad-spectrum antibiotic or recently had a fluid bolus or recently had diagnostics, like a lactate or a blood culture drawn,” he said.
The second part requires developing protocols on how to respond to the alerts. “These scores do nothing if they're not built into powerful clinical workflows,” said Dr. Jaffe.
“If you don't know what it is you would do to change an outcome, and in fact you don't do anything differently, then amazingly, you won't change any outcomes,” agreed Dr. McNair. Such work is underway on a model developed by Cerner to predict acute kidney injury (AKI) 18 hours before changes in creatinine or urine output.
Hospitals are currently testing the model in prospective studies to determine whether more intensive monitoring can improve the outcomes of patients the alert identifies as at high risk for AKI. “It also might enable automatic order sets related to more intense diuresis or altering fluid management,” Dr. McNair said.
Role of hospitalists
These challenges highlight the need for clinicians, especially hospitalists, to be involved in these machine learning projects, the experts said. Initially, administrators might have believed that “you could ask a team of data scientists to go off and build a solution,” said Dr. Umscheid. “But that will likely not prove successful. You need frontline providers to help inform what outcomes you're predicting, how those outcomes are defined, and the timing of the intervention, among other parameters.”
Having a hospitalist in the room can save a project from going off in an entirely useless direction. “One can be very careful in terms of setting up a problem and looking at data that occurs just before a clinical diagnosis, inspecting the causal ordering of events, but one can still trip up,” said Dr. Wiens, citing her team's efforts to find predictors of a C. difficile diagnosis. “If we include empirical treatment, then we can achieve very good predictive performance. But the most important factors will be things like oral vancomycin, in which case the model is of limited utility, since it means the doctors already suspect C. diff.”
A hospitalist can also help researchers pick out variables that will be more useful in practice because they are modifiable. “If a model predicts that a patient is high risk because of their old age, then we may be unable to reduce the patient's risk,” said Dr. Wiens.
A physician doesn't need a lot of computer expertise to be helpful in such projects, Dr. Jaffe noted. She was chosen to lead the implementation of a predictive model at her institution, in partnership with data scientists and analysts, because she had been advocating for its development, not because of formal training in informatics.
Machine learning projects fit well with many hospitalists' interests and efforts, added Dr. Umscheid. “Just as hospitalists have become leaders in quality and safety and clinical informatics on the inpatient side, I would very much see many hospitalists having key roles in leading teams to use these types of data and these types of tools in wise ways,” he said.
There are likely to be many such opportunities to get involved in the future, as several of the experts predicted that many of the prediction tools developed by machine learning will need to be facility-specific.
Dr. Jaffe offered the example of her and Dr. Umscheid's sepsis alerts. “One of the items on [Penn's] score is that the patient is on a pulmonary service,” she said. “Because service structure differs from place to place, that piece of data might be useless at other institutions.” Other obstacles to extrapolating one hospital's system to another would be different metrics and standards of documenting, such as whether mental status is measured on the Glasgow Coma Scale or the AVPU (alert, voice, pain, unresponsive) scale.
Hospitalists who don't want to get directly involved in machine learning projects can contribute by supporting the technology's use, according to Dr. McNair. “In the face of many in the lay media who try to sensationalize the possibility of misuses of information, clinicians have a special role in extolling the merits of a learning health system,” he said. “If you have a situation where institutions or individual people are unwilling to share data, then you will not have the raw material on which machine learning methods could work.”
Some of the wariness surrounding the use of machine learning to make predictions may stem from its “black box” nature, said Dr. Umscheid, explaining that all the variables go into the algorithm and an alert comes out the other end, often without a clear explanation of how it was developed.
“People don't know how to critically assess that approach or that analysis, and because of that, they feel less comfortable with using the prediction, which is understandable,” he said. “Many physicians aren't going to do something based on a prediction if they don't understand where the prediction is coming from.”
They may have to get used to it, as Dr. McNair predicts that in the near future, computers will be taking over even more of the process of improving health care through data crunching.
“Use of unsupervised learning to detect innovative or novel patterns . . . is one of the fundamental features that we would predict would happen increasingly commonly,” said Dr. McNair. “Institutions or private companies like us would then commission prospective studies to follow up in the usual ways on those detected signals.”
Dr. Jaffe was a little more guarded in her predictions about the future of machine learning in medicine. “Clinical judgment works in a lot of different ways, and this is just one capability that we now have that we didn't have before,” she said, before adding, “I'll probably eat my words in 50 years when Watson is running the hospital.”