Sufficient Scruples

Bioethics, healthcare policy, and related issues.

June 29, 2006

Are Transplant Centers Really Underperforming?

by @ 11:20 PM. Filed under Biotechnology, General, Medical Science, Provider Roles

The LA Times reports that 20% of licensed heart, lung, or liver transplant centers in the US do not meet federal guidelines, and that they show higher-than-average death rates. This is a very serious matter, but I think there is reason to suspect that the paper is misinterpreting or over-reacting to the data.

Here is what they say the problem actually consists in:

About a fifth of federally funded transplant programs fail to meet the government’s minimum standards for patient survival or perform too few operations to ensure competency, a Los Angeles Times investigation has found.

The U.S. Centers for Medicare and Medicaid Services has allowed 48 heart, liver and lung transplant centers to continue operating despite sometimes glaring and repeated lapses, the newspaper’s review found. There are 236 approved centers nationwide.

Although many of the substandard programs treat small numbers of patients, their collective failings carry a significant toll.
Consider the latest available statistics, for transplants performed between 2002 and 2004. Nine lung programs failed to meet the minimum Medicare standards for survival, number of surgeries or both.

These hospitals accounted for 21 more deaths than would be expected, based on a government-funded analysis of how all patients fare nationwide within a year of surgery. It is adjusted for the condition of the patients and the organs.

Three dozen heart transplant programs didn’t meet federal standards for survival or volume. They accounted for 43 more deaths than expected.

Altogether, the programs examined by The Times had 71 more patients die than expected within a year of transplant.

Note that, to reach their 20% figure, they lump together the numbers failing either patient-survival standards or guidelines for minimum volume – they then attribute to all such programs a cumulative total of deaths in excess of expectation. But only one of these standards is directly related to patient deaths (the patient survival rate standard, of course). Centers with death rates above the acceptable maximum are, obviously, objectively in violation of a quality standard, but centers with low rates of a particular surgery are merely in violation of a training standard, which may or may not indicate low quality.

The minimum-volume standard is important, it’s true – it is well documented that patient recovery and survival varies directly with the frequency with which the center performs a given surgery, and there is obviously a minimum average frequency necessary for any acceptable overall survival rate. But that figure is only an average, and failing to meet it does not guarantee that a center will be peforming badly on patient outcomes. There is no guarantee either that a death rate which is higher than the national average will also be higher than the maximal accepted standard. (After all, roughly half of all centers have a death rate above the national average, one presumes – but only some of them have an unacceptably high death rate.) Counting centers which violate minimal volume standards and then citing numbers above the “expected” death merely inflates the total number of supposedly substandard clinics while making it sound like they are all suffering excessive deaths, which is likely not true. It is not good to have tranplant clinics operating with low volumes, but there may sometimes be a justification for it, and it does not necessarily mean that those clinics have bad outcomes. A much closer look at the data is needed before any conclusions can be reached in individual cases.

Then there is the matter of the raw numbers of deaths reported: 1-year death rates above the expected numbers by a total of 43 deaths for heart transplants at 36 transplant centers, 21 lung transplants at 9 centers, and (apparently) 7 deaths among liver transplants at 3 centers. In other words, these “substandard” centers each experienced an average of 1 – 2 deaths per year more than expected. The death rates at centers with low volumes of transplants were quite high in some cases, but often simply as a mathematical consequence of the small denominator (one hospital had a one-year death rate of 57%, but in raw terms that was only 4 out of 7 patients). There is no indication in the article whether the “increase” of 1-2 patient deaths per year they’ve noted is statistically significant or not, but it may well not be for at least some of these clinics.

The total of 71 “excess” deaths comes to about 6 – 7% of all 1-year deaths after transplants of these organs:

Heart:
Annual Transplants: approx. 2,100
1 year death rate: approx. 15 – 20% (or about 315 – 420 of each year’s recipients)

Lung:
Annual Transplants: approx. 1,000
1 year death rate: approx. 20% (or about 200 of each year’s recipients)

Liver:
Annual Transplants: approx. 5,000
1 year death rate: approx. 12% (or about 500 – 600 of each year’s recipients)

At the same time, the total number of transplants performed, and the total number of deaths each year, varies by much more than that (see above link). So, if you were asked to statistically distinguish good from bad centers, it would likely require evidence greater than merely a higher death rate.

In the end, then, they seem to have put their fingers on an important problem, but come up short on data that proves that problem is of serious extent. An increase in barely 70 deaths in a one-year period that will see over 8,000 procedures and 1,000 – 1,200 “ordinary” deaths is likely at the margin of statistical significance. It is difficult from that raw-number increase by itself to justify talk about “substandard care”, or to conclude that “The bottom line message is that there are too many programs in the United States that need to be shut down,” as did Dr. Mark L. Barr, president of the International Society for Heart and Lung Transplantation, or that “Congress needs to get together. . . . They need to sit down, they need to gut the system”, as a frustrated patient claimed.
There is good reason to wonder about these transplant programs and their success rates. The data in this article are a good first step toward monitoring and accurately evaluating problem programs. But we need much better data, much more careful analysis, and much clearer and more knowledgeable scientific reporting, to understand the situation carefully. Sadly, the LA Times has not acquitted itself well in this instance.

UPDATE: Fixed one typo and one poor word choice; clarified the last quotation in the second-to-last paragraph.

2 Responses to “Are Transplant Centers Really Underperforming?”

  1. Dr. DeFACCto Says:

    I’d like to respond to this in some detail:

    1. Your first sentence has two flaws: surely you meant “higher-than-expected,” and that, to correctly apply to the entire 20% individually, the sentence would have to say something like “they show excess mortality or insufficient volume or both.” If the 20% of programs are taken collectively, the statement applies, but it leaves open to the reader the interpretation that each program has excess mortality, which is incorrect.

    2. Your emphasis on statistical significance is, I think, misplaced. This is not research trying to differentiate two populations, but an application of minimum performance standards. (“Officer, what was the p-value on your radar gun readings?”) I presume that these benchmarks were derived from previous outcomes studies that were subjected to statistical analysis (see example below) and likely agreed-upon at concensus conferences representing the leading lights in transplantation medicine. Admittedly, the biggest voices will be those at the largest transplant centers and will have incentive to raise the bar on volume to protect their primacy, but I don’t think anyone can seriously argue that volume doesn’t matter in a field as complex as transplantation.

    Oops, you did. “But only one of these standards is directly related to patient deaths (the patient survival rate standard, of course). …centers with low rates of a particular surgery are merely in violation of a training standard, which may or may not indicate low quality.” The next sentence is a direct contradiction: “…it is well documented that patient recovery and survival varies directly with the frequency with which the center performs a given surgery…” I agree with the latter statement, and here I have a representative cite: JAMA. 1994 Jun 15;271(23):1844-9, demonstrating program volume to be a strong predictor of survival in cardiac transplantation. Continuing, “But that figure is only an average, and failing to meet it does not guarantee that a center will be peforming badly on patient outcomes.” I’m not sure what “only an average” means here; in the study above, mortality increased exponentially below the threshhold of 9/year. Also, don’t wait for a guarantee of bad performance. You wouldn’t wait to reject a baby-sitter until you had a guarantee they were a child-molester. Quite the contrary, which is as it should be. The same with a transplant center; you’d want a guarantee of good performance.

    3. “There is no guarantee either that a death rate which is higher than the national average will also be higher than the maximal accepted standard.” True, but irrelevant. It appears that you introduced the higher-than-average idea. As far as I can tell, the article mentions only higher-than-predicted mortality, which is based on a model including patient’s age, comorbidities, etc. When the article reports excess deaths, it seems to be with respect to this kind of model, to avoid punishing programs for taking higher-risk patients.

    Next, I suspect, lacking firm evidence, that these measures were chosen as leading indicators, or tripwires. Surely nobody who transplants a heart, lung, or liver is interested only in one-year survival; five-year survival would be a better gauge of the overall success of the program. But that takes too long to measure. You wouldn’t want to wait for five-year survival data to go bad to shut down a program that was seriously underperforming in the short-term.

    Just as one-year survival predicts survival down the road, program volume acts as an advance warning. Imagine a well-functioning program whose volume gradually slipped below the minimum floor. Initially, they should be able to maintain the standards achieved when they were busier, but over time will likely lose their edge. Note that in the study referenced above, it’s pretty clear that virtually none of the low-volume centers had ever had sufficient volume. Based on this I would agree with you to the extent that the presence of low volumes raises a concern but leaves more wiggle room than high mortality would. However, standards are standards, and for good reason, so i’ts entirely fair for the article to call a low-volume program substandard.

    4. “In the end, then, they seem to have put their fingers on an important problem, but come up short on data that proves that problem is of serious extent.” I must differ here. Would you want your loved one to receive a heart at the two-a-year program with its 57% mortality, or really any of the underperforming programs?

    “But we need much better data, much more careful analysis, and much clearer and more knowledgeable scientific reporting, to understand the situation carefully. Sadly, the LA Times has not acquitted itself well in this instance.” I suspect they made the most of what they had to work with from UNOS. You might want to look at their sources before making that conclusion. They’ve performed an important service in bringing to light a situation that the Feds have let go too long.

    I apologize, I didn’t come here to cut you or your piece to ribbons. But, compared with your other writings, this one jumped out at me. Maybe Orac would have some useful comments, being an academic surgeon.

  2. Kevin T. Keith Says:

    Doctor:

    Thanks for your comments. Please don’t apologize – my analysis stands or falls on its merits, and it’s certainly fair game for criticism. I appreciate your feedback.

    1. . . . surely you meant “higher-than-expected,” [not "higher-than-average"]

    Yes, thanks.

    to correctly apply to the entire 20% individually, the sentence would have to say something like “they show excess mortality or insufficient volume or both.”

    Yes, again. I overstated the case in my opening sentence. One of my accusations is that the tone of the article implies that all among that 20% had high death rates. Strictly speaking, the text does indicate that the 20% was composed of two sub-groups, but I do still think the article lumps them together further down.

    2. Your emphasis on statistical significance is, I think, misplaced. This is not research trying to differentiate two populations, but an application of minimum performance standards. . . . I presume that these benchmarks were derived from previous outcomes studies that were subjected to statistical analysis (see example below) and likely agreed-upon at concensus conferences representing the leading lights in transplantation medicine.

    But aren’t they trying to differentiate populations? Specifically, the subgroup of centers that are showing objectively bad outcome statistics from those who are doing well? For that, they need to use outcome metrics that are robust enough to make that distinction. If many of these centers with “excess” deaths are in fact only one death over their expected outcome values, then, to confidently determine those centers are underperforming, they must have a measurement tool on which a variation of a single death is statistically significant at the margin (and with, in some cases an “n” in the single digits). And I’m guessing they don’t. That seems to me a reasonable criticism.

    If I understand you correctly, you seem to be suggesting that the target goals were set as firm numbers – by reference to statistical predictions of acceptable outcome values – and it is now those hard targets themselves, and not some de novo statistical analysis of actual performance outcomes, which are the measurement standard. (So that if, say, a certain program was expected to have a volume of – picking arbitrary numbers – 20 patients per year, and the statistically predicted death rate at the minimum acceptable level would be 3 patients, then 3 becomes the hard and fast outcome limit for a given year. The program would then be in violation if it has 4 deaths, even if, say, the previous year it had only had 2, or if this year it had performed 30 procedures instead of 20.) The weakness of such a standard is obvious – it substitutes inflexible and idealized predictions for an actual analysis of performance data. From a public-policy perspective, this may actually be justifiable – it brings the benefits of clarity and ease of application – and I suppose I can accept that if in fact that is what they are doing. But adopting a bright-line rule that holds a specific level of performance to be the equivalent of incompetence is very far from saying that the claim of underperformance is true in fact.

    in the study above, mortality increased exponentially below the threshhold of 9/year.

    I don’t have access to that study, but I presume that mortality did not increase at every center in the sample. I suspect that
    average mortality for centers with varying levels of activity increased as the level decreased – which is expected, but not the same as saying that it’s inevitable.

    Also, don’t wait for a guarantee of bad performance

    That’s another reasonable point. But, as above, saying that there are signs suggestive of bad performance is not the same as saying you have proven bad performance. The article discusses centers that have “failed to meet . . . minimum standards” or exhibit “glaring and repeated lapses” – both of which claims I think they fail to prove. If they had simply said “government moves proactively against centers with marginal performance”, I wouldn’t have an objection.

    “There is no guarantee either that a death rate which is higher than the national average will also be higher than the maximal accepted standard.” True, but irrelevant. It appears that you introduced the higher-than-average idea. As far as I can tell, the article mentions only higher-than-predicted mortality,

    This is another good point, and again you have caught me in sloppy language, as well as a hasty assumption. I thought that the “expected” levels were the statistically predicted death rates for a given level of activity (and that this would represent an average rate for all centers with that level, hence my introduction of “average”). Thus, it seemed to me, the “expected” levels would fall in the middle of a bell curve of possible outcome numbers for the entire (possibly hypothetical) group of centers with the same level of activity – but that that figure would be too low to use as the performance cutoff. The maximal acceptable death rate should be somewhere to the right of the mean, because there would surely exist some level of performance a bit worse than “expected” that was still “good enough”.

    But if the “expected” level is the maximal acceptable level, then exceeding that level is a problem. I just didn’t think that the expected level would be set that close to, or at, the absolute limit.

    Thanks again for your input. You make some good points, and you are right that I perhaps was trying to impose an idealized standard on a problem with more practical dimensions. That’s a good reminder to receive.

Leave a Reply

Logged in as . Logout »

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

CommentLuv badge

About:

Search
Sufficient Scruples:

Categories:

Archives:

June 2006
M T W T F S S
« May   Jul »
 1234
567891011
12131415161718
19202122232425
2627282930  

Links & Feeds:

RSS 2.0

Comments RSS 2.0

XFN

Follow KTKeith on Twitter

Sources:

Powered by WordPress

Get Firefox!

Theme copyright © 2002–2014Mike Little.

Ask the Ethicist!

Podcasts:

White Papers:

Bioethics Links:

Blogroll: