false
Catalog
CHEST 2023 On Demand Pass
Big Data in Sepsis: From Epidemiology to Diagnosis
Big Data in Sepsis: From Epidemiology to Diagnosis
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Okay, good morning everyone, thank you for the invitation to present and this looks to be a great session because I love all the colleagues who are participating here. So I'm going to speak a little bit, I think, about part of what's exciting about data models, which is predictive models and getting a little bit into the AI aspect and really talk about with all of the hope and hype, what are some of the promises and pitfalls that we're still facing today. So to understand some of the key issues in developing, implementing, and evaluating these models in critical care. So today what we don't have is a shortage of predictive models. And so this is work that was done by a medical student, MD PhD student at University of Alabama who I worked with, Michael Patton, and he went through the literature and evaluated predictive models across some of the most common domains of ICU prediction. So those being mortality, sepsis, shock, AKI, et cetera, et cetera. The lines are the rise in the uptake of EHR, you know, in the hospital and the outpatient setting. And then the bars represent the count and then the proportion of these models. So you can see that in 2022, over 2,500 PubMed publications about predictions of mortality. And I'm sure Tim or Haley will speak to this later about the onslaught of this type of work that's happening, not only in individual health systems or medical schools, but in the journals as well. What he did find was what we lack is a clear improvement over time. So this is machine learning-based in-hospital mortality performance. And you can see here on the x-axis is time, so it goes all the way back to 1990. And then this is one dimension of performance measurement, which is commonly used as C-statistic or the AUROC. And what you don't see is a clear trend, despite there being more and more of these models here towards the right. You know, it's not clear that the models that were produced in the prior decades were that much worse than what's being produced today. So despite, again, the preponderance of new work in this space, you know, there is a bit of a right word, up and right trend, but it's not necessarily universal because there's many models that show kind of relatively low performance. The other thing is what we lack is also a proof of effectiveness in care. So this is, again, just this is specific to sepsis. You can see that as we moved into this most recent decade, there's been a greater preponderance of these models, more using things like deep learning and more advanced techniques. But again, a scattershot in terms of the efficacy. And that's because for all of the hope of machine learning and AI, there are some actual very challenging and basic fundamental limitations that we face in the way that we're thinking about these tools. So I'll give you some examples that I think are problematic. One is heterogeneity. So this was a meta-analysis that was done that looked at sepsis prediction models. And I'll focus a little bit on some of the more recent sepsis prediction models. And we all know this. There was substantial heterogeneity in the definition. You know, is it sepsis 3? Is it CDC-ACE? Is it a CMS Step 1 definition? Is it SERS? Is it some modified SERS? So we can't all agree on what we're calling sepsis. And I think that still remains a universal challenge. The second was the location in which the sepsis was being predicted, the ED, the ICU, and the ward. Many studies lump all of those together. But those are completely different clinical recognition opportunities and treatment opportunities. And you know, at least from my perspective, the prediction of sepsis in the ED should look nothing like the prediction of sepsis for a patient on the ward. They're two completely different animals. How can we lump them together and call performance similar across them? And then, of course, there was just lots of heterogeneity in very simplistic things like data preprocessing. For example, how do we take the granularity of temporal predictors, and how do I organize them? And just really simplistically, like, do I take the last value carried forward, or some summary statistic over the past 24 hours, or the worst? And there's a lot of subjectivity in those decision makings. And so we don't have necessarily a standard, but this creates a lot of challenges, especially when we're analyzing new models. The other challenges, AI and ML alone can't solve utility or generalizability. So this was really cool and important work from Emory, in which they, you know, stood up the Physionet computing challenge in sepsis. And what they did was they made available a dataset in a Kaggle-like competition across three hospitals. So these were the ones in which the model was trained. And then this one was the hidden dataset for external validation. And again, the question was, if we develop a model in these two rich datasets, what's our external generalizability? This shows you the missingness of all these different variables, you know, and so this is not a surprise. Or not the missingness, but the prevalence. So the vital sign's very common, but you know, you get a fair degree of missingness across the other variables. Now what they really did do, which is unique compared to other studies, is they didn't rely on this kind of incomplete AUC, right? That's the most common performance metric we see, the C-statistic. But it really is insufficient, especially when it comes to the value of one of these tools when put into clinical workflow. So what they did was they created a utility scale. And I'm not going to dive deep into this, but I would encourage you, if you're in this space of considering predictive models, to read this. So what they said is that for the patients who ultimately have sepsis, if you're a true positive based on this model and you diagnose this patient earlier, this is the time of sepsis onset, you get bonus points, right? If your model performs well in predicting it before, you should be upscored. If you're among patients of false negative, once you pass that optimal time window, and I think it was three hours before or six hours before, you start getting penalized, okay? Because that's exactly how, you know, clinical work should happen. If we get to a patient early, we should get a better outcome. And if we get to a patient late, whatever that tool we're using should get penalized. And then this is for the patients who ultimately didn't have sepsis and they just kept it flat. One could argue that for these patients, especially the false positives, you should actually penalize the score, right? Because it takes a lot of work to go evaluate a patient. It causes a lot of consternation, a flurry of activity, which is both problematic for providers, but potentially problematic for patients who might be getting inappropriate antibiotics or other types of workup. But this was really, really unique and informative for this exact reason. So here they compared all of the competitors' models, each one is a dot, on this AUC, you know, that common C statistic, and then on the utility score, okay? And so what they showed was that if we look at a kind of a common, you know, let's say a threshold where we'd like to see an AUC above .75, you know, and a clinical utility score, I've arbitrarily put that at .25, we'd like to be up in this right upper quadrant. Good AUC, good utility. But what you see is that there's actually relatively few models there. What's also informative is that the many models performed well in terms of AUC, but actually had poor clinical utility. And this is the challenge, is that even though we can make, in silico, these models appear like they're performing really well, when it comes to on-the-ground clinical value, we're actually finding that they are, in some cases, hampering our care. And then this was the other important finding. So they rank the performance of every competitor on test set A and B. And you can see, in general, the performance of the top groups was kind of similar. This shows kind of the rankings. But when they actually compared it to the hidden dataset, external generalizability, well now everybody moved across the entire spectrum. And in fact, the lower-performing models in the development datasets actually did better on the test set. And this is another fundamental problem we're struggling with, is how much do the data representations of each individual site really represent another site? And what is the role of kind of a universal general model versus models which are customized and calibrated to the individual data and process characteristics of the setting in which they're deployed? So these are some of the highlights. The other question, which I've already alluded to, is that AIML alone can't solve issues related to value. So many of you may be familiar with this. This work from Michigan, a Care and Deep Things group, where they evaluated the Epic sepsis model. They said it has poor discrimination and calibration in predicting the onset of sepsis. And this raised a lot of questions and concerns across the industry, in no small part, also motivated some of the guidelines about FDA regulation of software as a medical device, which has created a lot of challenges in this space. But what was interesting, even if the performance was poor at Michigan Medicine, here at Metro Health, I believe, they implemented this in a randomized quality improvement program framework. In their sample, the external validation was good. The alert fired before antibiotics were given, 54% of the time. And actually, the alert produced the intended effect. Shorter time to antibiotics, 2.3 versus 3 hours, and more patient days alive and out of the hospital. This was actually a small study, but it gets to this kind of discordance, where we, in the literature and in our reports, focus kind of single-mindedly on these performance characteristics, but they ignore these questions about whether or not that tool will be useful when implemented within a prospectively well-designed and standardized protocol. And that's kind of where we're really struggling in this field. Data alone, AI and ML, are insufficient for the task. And what they need are people like you, people like us, who are really knowledgeable about how to apply these things. And of course, so no surprise that sepsis models put into practice show massive variability. This study suggested that a simplistic alerting system decreased mortality by 53%. That is an absurdly large decrease in mortality from a silver bullet type medication or intervention. I don't mean the work is absurd. I just mean, when you think about it, 50% reduction in mortality. Similarly, a machine learning prediction algorithm decreased relative reduction of mortality in a randomized study by 58%. I mean, these are really huge effect estimates. And then otherwise, in other studies, no effect. No difference in any process measures or outcomes, and really no effect. So again, it's hard to really bring together these kind of very discordant results. And I think, again, points to the fact that there's a disconnect between what's happening in silico and what's actually happening on the ground. Of course, Shamim Namadi's group, they developed a prediction algorithm that says, I don't know. And that's exactly how I feel. When it comes to literature about these models, I can't tell, because we just don't have enough perspective studies. I'll just mention quickly, we've shifted from kind of at Kaiser Permanente calling AI augmented intelligence, because that places people, clinicians, patients, and communities, rather than the algorithms at its center. And I think that's really important for all of us as leaders in this transition of AI to be thinking that we need to be the kind of human handlers of this technology. The technology is going to do amazing, fantastically, almost unbelievable things over these next five to 10 years. But where it will falter might be in very basic aspects of how it understands a clinical situation, and then how we deploy it effectively, sustainably, et cetera. Some of the work we're doing, we're funding five health systems, actually, for external validation to implement AI tools prospectively through rigorous trial designs. So we've got 120 letters of intent for our three to five slots, and we're moving to selecting our finalists. I think another thing is training up our young folks, so these are not all clinicians or pulmonary critical care fellows, but our informatics fellows. And then we're working towards governance, and I bet that many of your institutions are doing the same. So understanding the different types of domains we need to have expertise in, in order to use these tools effectively, whether that's performance, technology, operations, regulatory, model intelligibility, bias fairness, outcomes value, maintenance, and community input. So I'm extremely excited about the frontier, but we are in some ways faltering with some of the very key and bedside related aspects of using these tools. We don't have a shortage of them today. What we have is a shortage of evidence that these things are going to dramatically improve the efficiency and quality of our care. So thank you very much. We're on the edge. Yeah. Thank you. Vinny, I hope you can be around later. It looks like we're pushing the envelope on time, but I'm going to come work for Vinny because augmented intelligence is most assuredly what I'm in need of. Our next speaker, you guys know probably as editor-in-chief of Critical Care Medicine and Critical Care Explorations. He's a professor of a multitude of things at Emory University, surgery, anesthesia, informatics, you name it. But the hat he's not exactly wearing today but close is that he also works for the Biomedical Advanced Research and Development Authority where he has led a team of investigators with access to every single patient record in the Medicare database over what's approaching about a 10-year period now. And Tim Buchman is going to tell us what he has gleaned about sepsis, some of what he's gleaned about sepsis in that period of time. Thanks so much. And let me see how I get out of... Oh, yeah. This will be my first time too. Let's just escape there and then we should get you there. Okay. Perfect. Thanks very much, Steve. It's a privilege to be here. And let me echo something that Vinny said. It is frightfully important that when you send manuscripts in describing forecast models that you give us some information as to how it's being implemented and, if possible, how it is affecting processes of care and patient outcomes. We get enough submissions to create a separate journal entitled Journal of AKI Prediction from the MMIC database. Literally, we get three of these a week. And our future really depends not on building new models but really understanding how they are most effectively implemented at the bedside. Steve assigned me the topic of big data and sepsis from epidemiology to diagnosis with a focus on what have we learned about sepsis from the CMMS studies. I've listed the four studies here that I'm going to be referring to intermittently through the talk. I do have to tell you that, as Steve said, I'm editor-in-chief of some things and senior medical advisor to BARDA. It's clear that all the opinions I express today are personal and may not represent the position of any of the listed entities. Now, in terms of objectives, I'd like to speak a little bit about Medicare and also Medicaid CHIP as a pair and understand a little bit about the data collection and analysis infrastructure. I'm going to talk a little bit about the burdens, trajectories, and forecasts of sepsis, bringing us all the way up through the beginning of the pandemic. I'm going to address one of the questions, are the administrative data sufficiently reliable to support inference? As all of you in the field know, there's been controversy as to whether one can rely on administrative data and codes and so forth to pick out, is this a sepsis patient or isn't it? I'll talk about a couple of hidden gems in the published data, and I'm going to tease a little bit about what we are doing in the COVID era. Now, as a quick review, what is Medicare? It's health insurance. It's for people 65 or older, but it's also for some younger people with disability. It's for people with end-stage renal disease, and it's also for people with ALS. The data I've shown here are reasonably current. There are about 66 million total Medicare enrollment folks. I want to call your attention to the fact that there has been a shift in the programs in which beneficiaries are funded. If you look at that lower right with the bars, you can see the decline in the salmon color, that's the fee-for-service patients, and the growth in the purple color, those are the Medicare Advantage patients. From the beginning of our study in 2012 to where I'm going to present today through February of 2020, there's been a big shift, and we'll see some of that in the data. I'd also like to mention that Medicare as a program is distinct from Medicaid, as well as the Children's Health Insurance Plan. As of July 2021, there were 83.6 million folks who were getting Medicaid or CHIP. There exist dual beneficiaries eligible for both Medicare and Medicaid, and such dual beneficiaries constitute a quarter of the Medicare sepsis data. In fact, that number has grown. We are now at 93.8 million as of May 2023, so a lot of beneficiaries. Let me give you first a little contrast on the sepsis inpatient mortality. I'm going to look at the fee-for-service patients only, the traditional patients, just to give you an idea of the difference between the dual beneficiaries and those that are the traditional Medicare alone without the dual classification. You'll note at the top in rust, you have the patients who are only Medicare. In blue, you have the patients who are dual beneficiaries, and you'll see that the mortality of the dual beneficiaries is lower. That really reflects the age distribution of that dual population. Down at the bottom, I've shown you some dotted lines. Those are patients who were admitted and did not have sepsis, so that's the distinction. They're either sepsis or not sepsis. They're both, and incidentally, this is the CMS definition, what's used for payers. It's not quite sepsis, too, but it's close. Now, when it comes to getting a hold of data, if you're an academician, you know about RSDAC and how hard it is to actually plumb the depths and get access to line-level data. That's not the view I get to use from inside of BARDA. We work with a group called Acumen that runs a project called DataLink. We see 100% of the Medicare data. We actually see some of it quasi real time as charges come in. The data I'm going to speak about today are final data, nothing beyond. We do not present data until it's one year out from the claim period. So what you're seeing is 100% of the data total as it resides within the CMS data set. Where have all the data gone? When we published these articles in March 2020, we posted an interactive visualization. I've given the URL at the bottom here, but that also appears in the March 2020 issue of Critical Care Medicine. These articles are open access. You will see a link to the visualization. It's a very useful way for you, your colleagues, your fellows, and your students to actually explore a very large data set, albeit a very high level view. As Steve mentioned, it's a team effort. Steve is the second person in that picture group because he and I work very closely together. I'll take the blame for anything that's unclear and he can have all the credit. So what have we learned about sepsis among Medicare beneficiaries? I'm going to go through these three points first. First, it's a common and increasing fraction of inpatient admissions. This is the figure that usually gives people pause and says the coding is nonsense. What you can see, the rising from lower left to upper right, are the fraction of inpatients with a sepsis tag. So if you had 100 patients in your inpatient unit, starting from left to right, you would see that originally it was about six out of 100 who had a sepsis tag. By the time we got to the beginning of the pandemic, it was 12 out of 100. You're saying, oh, my God, that's over-coding. We can't trust the data. I'll also mention that you see three lines from lower left to upper right. That gold line is the Medicare Advantage group. And they've always been a little bit healthier. What's happened is the population has moved through fee-for-service to Medicare Advantage, generally sicker patients are getting into Medicare Advantage, and you can see how those lines begin to come together towards the end of the study period. The reciprocal over there, those reciprocal lines from a little bit close to the top, upper left to upper right, that's just the reciprocal non-sepsis shown on the right axis. So let's forthrightly comment on the distrust in administrative sepsis data. You have to choose the right denominator. What I'm showing you here over the course from January of 2012 through the beginning of the pandemic, February 2020, I think are very important conversations. In red shows the fraction of total Medicare beneficiaries who were admitted with a non-sepsis diagnosis. They do not have a sepsis tag in their coding. And you can see the steady decline over time from about 2.5% to about 2%. In blue I have shown the ICD-9 to ICD-10 cutover. You can see where the cutover occurs. And you'll see that since the cutover, the fraction of patients admitted each month, Medicare beneficiaries, is stable at about .25%. We'll talk about this more at the end. But when the coding system changed from ICD-9 to ICD-10, there was something of an inflection point and things began to level out. Pretty much one out of every 400 Medicare beneficiaries gets admitted each month with sepsis. I'll also mention, and this was reported in 2020, that if we read from the bottom up, that's the septic shock fraction and the severe sepsis fraction, those are not really subject to over-coding because the criteria are so tight. That fraction has remained relatively flat. So what I'm telling you is that certainly since the cutover in October of 2015, I have much greater confidence in the sepsis codes than many other individuals. Some things that we learned, sepsis is associated with terrifying mortality. At the top you see mortality during or within one week of hospital discharge. We use those data, that classification to capture the hospice folks. We had the marked improvement in the severe sepsis patients who were able to survive their hospitalization originally with a 30% mortality, down closer to 15% at the end. Nevertheless, within six months of hospitalization, you can see that those who survived, who were admitted with septic shock, we have close to two-thirds of them have died. Let's look a little closer at those six months and ask what are the trajectories of Medicare beneficiaries with and without sepsis. On the left, I've shown the trajectory of the sepsis patients, and in the red box, I'm showing you those who are either deceased or in hospice, close to one out of three. Compare that to patients who were admitted for a non-sepsis admission, you can see that the mortality is somewhere south of 15%. So it illustrates the very different trajectories of these two patients, two groups of patients. Next, sepsis care is pricier than even you have probably imagined. That's the monthly cost, the scale there is in billions of dollars. And this is just the cost of the IP admission itself. It does not include professional fees, SNFs, any of the other accoutrements that goes with it. I want to talk a little bit about the inflection point that we began to observe. And what we've shown there is the fraction of patients admitted with sepsis present on admission at the top, not present on admission at the bottom. And again, as you can see, there was a change in coding practices that somehow got things cleaned up. I can't tell you why it happened, because it's a national phenomenon, but there was clearly an inflection. And we were able to use that inflection to predict a 12-month moving average of sepsis admissions. The inflection is the cutover from ICD-9 to ICD-10, but that ongoing rise is simply the count, increase in count of total Medicare beneficiaries. How good were our estimates? You can see the actual data versus what we were able to construct as a model, and it's a pretty good model. Now, obviously, we don't have access to the actual cost of the Medicare Advantage patients. This is the fee for service patients only, and all the values were adjusted to the CPI to make sure we got inflation out of the picture. Simply the number of patients coming in, remarkable. This led to a 2019 projection at the cost to Medicare and the nation for the total cost of sepsis. Now, this is a projection that includes both Medicare and non-Medicare patients, does not include VA or DOD. We came up with a number of $57.5 billion. Remember, in 2016, based on the 2013 data, Torreon Moore estimated that exact number to be $23.7 billion. Now, what's in that $57.5 billion number? It's a rough order of magnitude estimate of the inpatient hospital costs. It assumed that the cost per stay for all sepsis IP admissions was about the Medicare cost, and this new rough order of magnitude was actually 8.4 percent higher than the estimate we had made in the previous year of about $53 billion. Again, the estimate continued to omit professional fees as well as all the costs incurred in VA hospitals and military hospitals. And then came COVID. I can't say much. While the analysis is complete, the data is still being prepared for publication, but I want to share one piece of information with you about the data reliability. I've taken a frame that you saw earlier and now extended it out through COVID to talk about what happened to the total inpatient admissions per enrolled beneficiary and the sepsis admissions. And what you'll see there is that with the onset of COVID, there was a marked drop as a step function in the non-sepsis admissions. All the elective stuff stopped, never really recovered. If you look, however, at the sepsis information through the COVID period, you'll see that it has remained relatively stable, which gives me even greater confidence that the coding is accurate and we can use these administrative data to make reasonable inferences. So in summary, sepsis remains common, lethal, and costly. Since the cutover, the fraction of Medicare beneficiaries with an inpatient stay has been relatively stable compared with the ICD-9 era, albeit with the usual seasonal variation that I hope we're all familiar with. The administrative data indexing to the count of beneficiaries, not the fraction in the inpatient environment, the stability makes sense. The mortality rate steadily improved from 2012 through the pre-COVID era. The largest impact was seen in patients with sepsis with organ dysfunction, and the cost continued to rise. Part of the rise was due to the growth in the number of beneficiaries, but I'll also tell you from the data, there was an unexpected uptick in per-admission claim around 2017. Compared to the 2016 estimate, what we had originally bandied about as $23.7 billion was already up to $57.5 billion in 2019, and that number still excludes the professional fees, the VA, and the DHA. Thanks so much. Please evaluate this session in the afternoon. Cool. Our next speaker is going to, let's see, oh, I see we're slightly out of order up here. Next speaker is Laura Evans, who's a professor at the University of Washington, Seattle, and director of critical care there. Also, as you are probably aware, co-chair of the most recent two surviving sepsis campaign guidelines, for sure the most recent. And so I think you're all aware that people like Chris Seymour have uncovered, and actually people like Vinnie Liu as well, have uncovered sepsis phenotypes, and Laura's going to talk to us from the perspective of someone who combs the literature and helps us practically implement these things, what that might mean in terms of practicing sepsis care. Thank you, Steve, for the introduction, and to Katie for co-moderating. It's really, really nice to see everybody, even on the last day of the conference here. I do want to note, in case you in the back haven't noticed, that Steve has refreshments up here in the front of the room. So this may be the only conference I've ever been to where the moderator has a half a bottle of wine from, I assume that's from last night's dinner, but. I might point out, I put it up here to point out, this is a gift from my friend Bert Lesnick, who is a wine connoisseur, who handed it to me this morning, and I prefer to view it as a bottle half full than a bottle half empty. And I thought it was just for when I get a dry mouth up here, you can just hand it over this way. I want to start with just some disclosures. In the last couple of years, I have served on, as a member of a scientific advisory board for a company called Endpoint Health that does engage in work in sepsis precision diagnostics. Like Steve said, I was the co-chair of the Surviving Sepsis Guidelines, so I have very strong opinions about sepsis quality improvement. From then, I'm a co-investigator on an NIH trial of clinical implementation of sepsis bundles from that. I think it's kind of hard to start most talks about sepsis without kind of recognizing the landscape of clinical trials over the past 40, 50 years now in sepsis. And by and large, unfortunately, it's a landscape of a succession of negative trials, right? It almost gets to the point now when you see a large trial come out in JAMA or New England, and we kind of roll our eyes and go, it's another negative trial in critical care. And there's probably many explanations for that. But I think one common explanation that gets brought up is this concept that we are studying things that are defined by clinical syndromes and a constellation of clinical findings rather than biologics from that. So that we're sort of looking at sepsis as this constellation of inflammation, signs of infection, and organ dysfunction. But in fact, we're studying a very heterogeneous population. And that perhaps we would do better in terms of identifying potential specific management strategies or even therapeutics if we could accurately identify subgroups that may respond differently from that. And so I want to go through a little bit of kind of like the alphabet soup here from it, because I think these terms get bandied about often as kind of the same thing. And they're probably functionally slightly different. And so this particular talk, we're going to focus mostly on work from Chris Seymour's group about sepsis phenotypes. And so when we say phenotype here, what we really mean is a clinically identifiable subset that you can identify by observable characteristics. It doesn't imply a difference in underlying mechanism. And hopefully these groups that you identify may have different outcomes or different responses to treatment. That's different potentially than an endotype. And really these things do get kind of parlayed as very similar terms from that. But if you're being precise in the language from that, often I'll find myself referring to them as subgroups or subtypes rather than trying to sort of figure out if I really intend that there's a mechanistic difference here. But an endotype definitionally would be a subset that has different pathological or pathobiological or functional differences. So it really does imply this different mechanistic piece to the group or to the subgroup identification. And I want to come back to this point here at the bottom because I think it's something that both Vinny mentioned earlier as well as Tim mentioned earlier. And we can't ever, I think in a sepsis session, can't ever lose sight of the concept that we're dealing with a time sensitive condition. And most of the work that's out there right now about endotypes and about phenotypes and linking them to patient outcome really is in a defined population. So once they're in the ICU, once they have organ failure, once they have multi-organ failure, once they're in shock. And what we really want to do I think moving forward when we talk about are these definable subgroups, phenotypes, endotypes, are they ready for our intervention and do they change our management is really push that window of identification of these subgroups way, way earlier when we're in that sort of golden hours of sepsis resuscitation and identification from that. So we'll come back to that. So I think a lot of the potential uses for phenotypes is in clinical trials within critical care and since we're talking about sepsis, strictly in sepsis from that. And you can kind of break that down into two main kind of purposes of how you might use these subgroups in clinical trials. And one is this concept of predictive enrichment. Most of you have probably heard this term from that. But when you're talking about designing a trial with predictive enrichment, you're trying to enroll patients preferentially who have a greater likelihood of response to the therapy that you're testing in the trial. So an example of that would be examples of the, you know, if you go back to 2000, 2001, now this idea of using patients who don't respond to an ACTH stimulation test to enroll them in a trial of corticosteroids for sepsis. Because the theory was at that point, right, that patients who didn't respond to ACTH would have a better response to corticosteroids. So that you're basically enriching your trial for that population that you think is going to respond better. The other way you can do this for clinical trials is in terms of prognostic enrichment, right? So we all talk about how expensive trials are, right? How many times we go to Journal Club and say, well, those trials are underpowered. So prognostic enrichment is basically trying to get you to say, I'm going to design a trial that has greater power at a smaller sample size. And so you're using this to try to enroll patients who have a higher likelihood of the outcome of interest. So if I'm studying the role of proning on mortality in ARDS, I probably want to preferentially enroll, I think, mechanistically maybe, but I really want to probably enroll patients who are on the sicker end of ARDS patients. That may be manifest by using folks with a lower P to F ratio. So it's really about, in terms of incorporating these subgroups into your trials right now, it's really about predictive enrichment and prognostic enrichment. And we'll come back to both of those. I want to recognize that the landscape around these identifications of phenotypes is changing quickly from that. We already heard from Tim about sort of the volume of journal submissions and Vinny mentioned that for AI and natural language models from that. And this is a really nice scoping review that was published recently in Critical Care Explorations at the beginning of 2002, or sorry, 2022. And it went through, combed through all the literature about looking at phenotypes specifically for sepsis from that. And I know you can't read this and I don't want to belabor this, but they pulled out about 17 studies of trying to identify sepsis phenotypes. And the kind of traffic light, red, yellow, green there refers to the risk of bias in these studies. And you can see that, at least by this scoping review, the kind of evidence informing these phenotype trials remains highly variable. Some of them were deemed to be low risk of bias, some of them intermediate, and some of high risk of bias. And I think that's probably a fairly accurate kind of state of the literature at this point with regards to sepsis phenotypes. Seven out of those 17 articles actually linked the phenotype to a prediction of treatment response or clinical outcome. And I think this actually goes back, these talks, we didn't plan them ahead of time. They actually segue really, really well, I think, because, you know, Sir Vinny mentioned in his talk that models don't necessarily outperform one another, they just demonstrate another new model of a way of looking at the same thing. And so only seven out of these 17 articles actually pulled out a link to treatment response within them. And I'm going to move now to focus really on probably the, not necessarily the biggest, but probably the most sort of highly, high impact pro-application in this group of seven from that. And this is data that Chris published from the University of Pittsburgh group from that. And it's a really complicated study. So for those of you who read this in JAMA when it came out, it takes a while. This is a complicated study. But it really has three main goals to the study. And it's using six different clinical data sets to accomplish these three goals from it. And the three goals were to derive sepsis phenotypes using commonly available clinical data. And I think we can all kind of understand why that would be a great goal to have in our study, right? The next is to correlate those clinical phenotypes that hopefully they were able to derive with host response biomarkers and clinical outcomes. And the third is kind of a really cool exploration to assess the potential effects of the phenotype composition of a trial on that trial's outcome. So looking at this concept of heterogeneity of treatment effect. And if I change the trial composition by phenotype, would that change the outcome of the trial? So those are the three main studies for that. There's a very nice schematic of how they do this and which clinical data set they use at each step here. This is in the supplementary appendix of the trial from that. They used 29 clinical variables that were based on, selected by the investigators based on their association with sepsis from previously published literature and that were thought to be relatively readily available in the electronic health record. To get at the specific data that was included in the phenotype model, it was the most abnormal value within six hours of hospital presentation. So I wanna come back to that time window when we're talking about sepsis. This is really about that six hour window from it. They used a methodology called consensus K means clustering to determine the optimal number of phenotypes. That is well beyond my statistical methodology expertise from it, but it's a fairly robust statistical technique to try to say how many phenotypes are there really within this. And then assess reproducibility of these findings using several other mechanisms and sensitivity analyses from it. These are really beautiful chord diagrams that they developed looking at the association of these phenotypes with different abnormalities from it. And this is all four phenotypes that they identified together, alpha, beta, gamma, and delta and you can see that the kind of colors split with different relative frequencies of different types of organ failure or different clinical presentations in these different phenotypes. When you split them out, this is what that looks like. And so you can see that for example, the delta phenotype which is the light blue has a lot more chords going to different types of organ failures. And so in this phenotype identification, delta was associated with the most organ failures from it. And when you look at, and these are the six different clinical populations that these phenotypes were studied in, you can see beautiful separation within the relative, this is short-term mortality within these populations. And you can see that there really is a nice separation which the phenotype is associated with differential short-term mortality from that. I'm not gonna show it to you today as well, but this I think should trigger the question to you in the audience of saying, well, is this just one more way that we've succeeded in demonstrating that sicker people are sicker and that sicker people have a higher risk of short-term death? These actually, when you look at all the different biomarkers actually probably do correspond to not just being a marker of severity of illness, but rather being a marker of some potential underlying patient differences from a standpoint. And then I wanna go through this concept of this using simulation, so using Monte Carlo statistical simulations to assess what happens to these clinical trials if you change the proportion of different phenotypes within it. And so they did this in three trials with just the access trial, which is erythroin, prowess, which you all will remember is activated protein C in the process trial, which is one of the early goal directed therapy replication trials from that. So this study, this slide here is a bit complicated, but what you're looking at here is all the way on the left hand column is the baseline data. So this is the composition of the trial is that, and then what you have in the middle panel, let me see if this works here. The middle panel here is when you change the composition of the trial, these are the phenotype distributions. When you change the contributions here and say I'm gonna increase the proportion of the alpha phenotype in the trial. So the other proportions of the phenotypes correspondingly decrease from that. And then you look at the likelihood when all these different simulations, 10,000 different simulations of the trial, the likelihood of the trial showing harm, red bars versus benefit. So in this particular trial, as you increase the proportion of patients enrolled, alpha phenotype patients all the way up to potentially close to 100%, the likelihood of that trial showing a harmful result diminishes, and the likelihood of it showing a beneficial result increases. So that with phenotype changes, and changes to the trial population, it changes the likelihood of a positive versus a negative trial result. Saying that perhaps we really can use these types of things for predictive enrichment for our trials to better study which populations these may work for. Similarly, same concept here, but this time you're changing the proportion of the delta population, so again that phenotype with lots of organ failure and lots of shock from that, and showing the likelihood of the same things, in which as you increase the delta phenotype, your likelihood of a harmful trial in this goes way up. So you can see the same thing, this is done, that was with the erythrocyte, this is done with prowess here, so the activated protein C, so as you increase, and sorry I didn't transpose the phenotypes up here, but this is this increasing alpha phenotype, the likelihood of a negative trial increases, and really no change as you increase that delta phenotype in the prowess trial, and this is with process, and I think we all kind of think about this, particularly when we think about the controversy around fluid administration, is who are those patients who are likely to benefit from fluid resuscitation and how much, and you see as you increase the alpha phenotype in the data from the process trial, the likelihood of a positive trial goes way up compared to sort of the natural state here, and as you increase the proportion of those delta phenotype patients, the likelihood of harm exceeds 50%, the likelihood of showing harm in that trial, so let me go back to the question briefly, and just say the title of the talk was sepsis phenotypes, where did they come from, and are they practical, and I will give you my opinion at this moment in time, I think there are more and more emerging data that we can identify important clinical phenotypes, and that they are associated with difference in outcomes, and potentially different treatment responses, I don't think any of this is clinically actionable at this time as a clinician at the bedside, I am hopeful for the application in these two domains of clinical trial design, right, in particularly saying that perhaps we can do this to design smarter, more efficient trials, particularly if we combine it with other newer trial design modalities, such as using adaptive trials, and try to spend less of our time, money, and effort on things that are ultimately not gonna pan out, while trying to pinpoint populations that they may pan out, and my personal opinion is, of course, the earlier we can do this within a patient's course is the most likely to be beneficial, I think the work from the Pittsburgh group is really seminal in the fact that it really does move that needle way up, I personally feel six hours is probably still longer than our goal in terms of identifying actionable phenotypes that's gonna change the way we approach them at the bedside, so I will stop there, and hopefully we didn't go too far over time. Thank you. Thank you. All right, our final talk this morning will come from Dr. Haley Gershengorn. Haley is professor at the University of Miami, and the editor-in-chief of Chess News spin-off journal, Chess Critical Care, and she's here to provide us with an editor's perspective on what we've heard, and also a shameless plug to send your work to her journal. But don't take it away from Tim's journal, that would be bad, right, that's, so how do I, oh, I'm sorry, no, what do I do? Push a skip, yeah, there you go, mm-hmm. Where is me? Okay, sorry, thank you guys very much for the introduction and the invitation to come, and again, this is a bit of a daunting task to speak from an editor's perspective with one of our senior critical care editors sitting in the front row, so Tim, feel free to speak if I don't, I don't describe the editor perspective well in your mind. So I have some disclosures, they don't have anything related to what I'm gonna talk about. So my goal right now is to give you an idea, at least from my end, how I think big data, machine learning, and artificial or augmented intelligence are potentially useful and then potentially problematic in these three domains, research, academic writing, and manuscript reviewing, as it pertains to submitting things to a journal, some of which I think all of the three speakers, and particularly Tim in his introduction, have mentioned already, and I'm probably just gonna reiterate some of that. So just to get us all on the same page, because I think that big data is a term that we throw around and doesn't always have a consistent definition, I'm not gonna suggest that this is the right definition, but this is where it started, so this is a data scientist back in 2001 who was working sort of in the commercial and financial space, and noticed that there was more and more data coming through e-commerce, and basically said, we're gonna have to deal with these high volume, high velocity, high variety data sets, and that this is gonna be new and transformative, and as some of you may know, this has been sort of expanded over the last decade or so, to be not just these three Vs that Doug Laney originally identified, but to include both value and veracity about sort of what is all the fast and high moving data we're getting, does it add value, is it actually true? And I think this is an interesting schematic that I saw about how to use big data in healthcare, and I think it sort of highlights two of the main issues that I'm gonna try to talk about toward the end. So first on the left hand side, this is where data may come from, I think what's really interesting, right, is that the one I always think about, and I think is the easiest to see, is the third one down, the electronic health record, and I think certainly for seasoned investigators, but also the average clinician who's working in a health system that now has an EHR, this is the data that's most available to us. That data can get accumulated in a data warehouse, and then sort of the meat of it is in the analytics, and both Vinny and Tim have spoken to this a little bit, and then hopefully from that we get improved outcomes. And so Matt Chirpek and his team put together this review, and Matt, I think, along with Vinny, are probably two of our most seasoned machine learning predictive models folks in our space, and sort of described what is the type of big data that we might see in the ICU. And as you can see, it sort of takes, or it can be across many different domains, patient information, diagnoses, et cetera, and some of it, they're noting, is quite easily available for us in our typical EHR. Some of it may not be always available, but you can kind of piecemeal parts of it together, so grab some of it from a separate data set. And so this really is, even just in descriptive quantity, a lot of potential data. Forget the individual elements that are in each of these. And I think this is, at least for me, where things can sometimes go awry. And so I'm gonna talk about two separate areas, again, both of which I think have been alluded to, but sort of at least tell you where my perspective is that some of the issues arise. So the first is in how we analyze that data, and the second is, how do we move that analytic, or do we move that analytics into prediction? So again, going back to that review by Matt Chirpik's group, they note, I think, the key insight, and this has been something that's been addressed already in this session. So bad data science abounds. This is not to say that people are intending to do something poor, and this is also not to say that trying out these things is not important in moving us forward, but what to do with them, I think, when the information comes out. And so the ease of access to large amounts of this data, and the computing power that now any of us, and I can do this on my home computer, right? I don't need a room with five big servers. I have good statistical power that I can sit on my laptop and access. This really opens this up to allow for phishing and expeditions, potentially, that result in low-quality research. And what I try to indicate on the right-hand side is that not all investigators are created equal, the same as not all clinicians are created equal, or all educators. And the fact that many of us may now have equal access to the data does not necessarily mean that we all have equal proficiency in how to manage it. And I think, at least from my perspective, we saw this a lot. I was not in this role at Chest Critical Care at the time, but I had the privilege of being an associate editor at one of our other journals, Annals of ATS, and I think we all saw this during COVID. I would imagine, Tim, you did as well, that there was this even bigger proliferation of, I have access to COVID data, let me just do something with it, right? And so I think the ability to do something, the same way that I probably wouldn't walk into an operating room and say, just because I have a scalpel, I should cut into your chest, I think that just because I have access to data, I probably shouldn't be the one doing something with it. So I think that that's key. And I think the places where we go awry in big data, and particularly when we start using machine learning models are not necessarily dissimilar from any other sort of traditional observational study. By definition, historically, at least, many of these as predictive studies start as retrospective analyses, right? And here are sort of our typical things that people think about, confounding, selection bias, classification bias, that plague all of these observational studies. But I think within critical care, there's another one that, at least for me, sort of really gets at me. And this is this concept that people may or may not be as familiar with called immortal time bias. And I actually think this is where a lot of these predictive models and comparative effectiveness fail. So the immortal time bias concept, and this is, I think, a really nice review that Emily Vale led, really relies on the fact that in order to get an intervention, or potentially to have information to allow prediction, you have to be alive, right? You have to be there to get the information. If you as a patient have died, you can't get the information, and or maybe that creatinine that was gonna get worse can't get measured. And this is not so much of a problem if we're talking about a population where people don't frequently die. But unfortunately, as Tim alluded to, our ICU population is not that population, right? Many of our patients do die. And so this immortal time, the opportunity to have an intervention or to have a diagnostic test done that allows for better prediction, really is correlated with your ability to still be alive. And so it sort of biases you in an intervention setting to where you are, by definition, more likely to be a survivor if you got the intervention, unless you use intelligent statistical techniques to avoid that bias. And I think at least for me, this is where many, many observational studies, and particularly those using big data fail miserably. And so this actually was an interesting review done, not particularly within critical care, but looking in the seven most high impact general medicine journals, and I think three from cardiology and three from endocrine and metabolism. I'm not sure why the authors picked those groups. And sort of said looking back at comparative effectiveness research, again, not particularly big data, how well do we do with these things? And as you can see on the left hand side, I think defining our cohort, adjusting for confounding, things like that, we're pretty good at, because that's like we've all learned that since med school, right? But in terms of the things that I think really we screw up, we did a pretty poor job. So the top are sort of this idea that we never would question in randomized control trials, right? We use an intention to treat approach. I'm gonna randomize you on the intention to get something, and maybe we'll do a per protocol assessment after the fact, but we know that that's not as good, right? Because that screws up our randomization. Something similar would happen if you do this with observational data, right? Who's likely to get the intervention that you're looking at versus who actually got it? And you really need to kind of be clear about how you're going to adjust for those things. And then the second one is time points that are important in the trial that all of which lead to immortal time bias. And as you can see, we do a pretty crummy job of these things. So to me, this is really where these fail. And then I think the second one, at least from my perspective, and it was really fun to hear Tim make this exact statement because we did not plan this, I think. I think that these predictive models are valuable if. And so again, I did not have a medical student work with me nor I guess I did not find Vinny's study. So this is really very much the poor man's version of that graph that Vinny showed in a much more intelligent way. But we know there's been a burst in these predictive models. But I think very few of them do what Vinny alluded to, which is to be useful to us in clinical practice. And I want to call your attention to if you want to see how one is well done. And yes, I did know Vinny was giving this talk and nonetheless, this is the example I give almost every time. I had the privilege of handling this at Atows Way TS when Vinny submitted it. And as you can see, he aimed to create a model to evaluate incidence rates for all cause readmission, et cetera. But the key, at least the key to our decision to accept this was we then estimated how this theoretical model might impact clinical workload. And I think, and I am not at the position I hope one day to be to get the volume of submissions that Tim described of predictive models. But I will tell you that to me, the ones that matter are this, the ones that say I did a model and this is how I either assessed its impact or plan for potential evaluation. Okay, so that's my feeling about research itself. These are pretty quick in terms of academic writing and manuscript reviewing. I'm just gonna bring these up. I'll tell you what we've decided at the Chess family of journals and please Tim, speak up if you feel otherwise. I know this is on a lot of people's mind, particularly how is AI or sort of augmented models for language editing, et cetera, available to us. So this is from the Committee on Publication Ethics speaking particularly to the use of AI tools in academic publishing. And I've tried to highlight some of the keys here. This again is very similar to what JAMA came out with and I will be honest in saying this is very similar to what Chess as a group of journals accepted because of this. So I think it's very clear that AI tools should not be listed as an author on a paper. As I'll show you in a moment, authorship has certain requirements that AI tools cannot meet. And then, where's the bottom one? Sorry, you need to be transparent if you are submitting a paper in the use of your AI tools the same way you might be transparent about any other tool that you use, right? What statistical program did you use? What cell line did you use? And just talk to that. So again, in summary, these tools can likely be used at least ethically speaking at the moment to assist with publishing. They cannot be named as authors and they should be disclosed. And then in terms of for manuscript reviewing, and I feel like this is the one that we've struggled a little bit more with on the editorial teams at Chess, but I think we've come up with something at least for the moment we're comfortable with. Again, if you go back to the COPE guidelines, they describe two things that are very important for a reviewer. The first is expertise and the second is confidentiality. And the idea that an AI tool would have any expertise, I think is, particularly about your subject matter, I think, right, we would all think is unlikely to be true. But then I think the bigger risk, and I hadn't really thought of this. Matt Miles, who's the editor for Chess Pulmonary raised this, that we know that these large language models by definition are mining the internet, right? And we know that when you as an author submit a paper to a journal, there is a confidentiality pact there that I, as the editor, I'm not gonna share your work with other people. But by putting that into ChatGPT, for example, you are by definition sharing it. And so that breaks that confidentiality. However, I think there are potential roles for language editing using these sort of AI tools in reviewing that are not violating the expertise or the confidentiality point. And this actually was brought up to me not related to reviewing per se, but by a colleague for whom English is not their first language. And so this is an interesting survey of over 900 authors. They were in environmental health services journal papers, but I don't think that's relevant. From eight countries, high income countries down to low income countries. And basically authors were asked, what is your likelihood of participating in a conference, of giving a presentation, et cetera, recognizing that the vast majority of those in science are in English. Okay, and as you can see, there were many things where non-native English speakers felt that they were less likely to participate. But I tried to circle in the yellow there that three quarters of non-native speakers include or have to go look for language editing services. And actually when these authors came up with ways that we could help these non-native speakers, one of the things that they suggested for journals actually was the use of AI. So for me, if what you're talking about is both as an author, where you are freely disclosing that you use the tool, or as a reviewer where you're not putting the author's words necessarily into ChatGPT, but your words in the review into ChatGPT to help clean up the English, I think that's maybe a way to actually level the playing field. So again, in summary, I think we don't think that AI can be used to actually do the evaluation. I mean, it probably can be used, but whether that's ethical and appropriate. But it likely is okay to use it to improve the writing that you have submitted. And that doesn't break confidentiality because it is your words that you're putting there, not someone else's. So I guess in conclusion, what I would say is that big data machine learning and AI are probably amazing tools for research, exactly as Vinny and others have alluded to. I think we have many reasons that we have to work on improving upon them, but at least from an editor's perspective right now, just like everything else, this is something that certain people are more skilled at doing than others. It doesn't mean that we can't become more skilled, but I think if you're not, don't dabble in it. And that moving beyond prediction to actually how you would use them is important. And then as I mentioned before, AI can probably be used in academic writing, but you need to be upfront and honest about it, and it can't be an author. And then as a reviewer, it probably can be used to help you as a writer, but not help you as a reviewer. And that's me. Thank you so much. Thank you.
Video Summary
In this session, three speakers discussed the use of big data, machine learning, and artificial intelligence (AI) in sepsis research and clinical practice. The first speaker, Vinny Liu, discussed the promises and pitfalls of predictive models in critical care. He presented data showing the increase in predictive models in the literature, but noted that there has not been a clear improvement in their performance over time. He also highlighted challenges such as heterogeneity in sepsis definitions and the lack of standardized data preprocessing. The second speaker, Tim Buchman, discussed the use of big data to understand sepsis epidemiology. He presented data showing the increasing burden and costs of sepsis in Medicare beneficiaries. He also highlighted the need for careful analysis of big data to avoid biases such as immortal time bias. The third speaker, Laura Evans, discussed the use of sepsis phenotypes to improve clinical care and research. She emphasized the importance of accurately identifying subgroups of sepsis patients who may respond differently to treatment. Lastly, Haley Gershnger, provided an editor's perspective on the use of big data, machine learning, and AI in sepsis research and academic writing. She highlighted the potential pitfalls of using these tools, such as poor quality research and immortal time bias. She also discussed ethical considerations and the potential use of AI in manuscript reviewing. In summary, the speakers discussed the potential of big data, machine learning, and AI to improve sepsis care and research, but also highlighted the challenges and ethical considerations associated with their use.
Meta Tag
Category
Critical Care
Session ID
1147
Speaker
Timothy Buchman
Speaker
Hayley Gershengorn
Speaker
Vincent Liu
Speaker
Kathryn Pendleton
Speaker
Steven Simpson
Track
Critical Care
Keywords
big data
machine learning
artificial intelligence
sepsis research
clinical practice
predictive models
sepsis definitions
data preprocessing
sepsis epidemiology
immortal time bias
©
|
American College of Chest Physicians
®
×
Please select your language
1
English