Comments on: Are Transplant Centers Really Underperforming?

By: Keith Koepf

Keith Koepf — Mon, 25 Sep 2017 13:23:04 +0000

Say, you got a nice blog.Really looking forward to read more. Will read on…

By: Kevin T. Keith

Kevin T. Keith — Thu, 13 Jul 2006 23:11:48 +0000

Doctor: Thanks for your comments. Please don't apologize - my analysis stands or falls on its merits, and it's certainly fair game for criticism. I appreciate your feedback. 1. . . . surely you meant â€œhigher-than-expected,â€ [not "higher-than-average"] Yes, thanks. to correctly apply to the entire 20% individually, the sentence would have to say something like â€œthey show excess mortality or insufficient volume or both.â€ Yes, again. I overstated the case in my opening sentence. One of my accusations is that the tone of the article implies that all among that 20% had high death rates. Strictly speaking, the text does indicate that the 20% was composed of two sub-groups, but I do still think the article lumps them together further down. 2. Your emphasis on statistical significance is, I think, misplaced. This is not research trying to differentiate two populations, but an application of minimum performance standards. . . . I presume that these benchmarks were derived from previous outcomes studies that were subjected to statistical analysis (see example below) and likely agreed-upon at concensus conferences representing the leading lights in transplantation medicine. But aren't they trying to differentiate populations? Specifically, the subgroup of centers that are showing objectively bad outcome statistics from those who are doing well? For that, they need to use outcome metrics that are robust enough to make that distinction. If many of these centers with "excess" deaths are in fact only one death over their expected outcome values, then, to confidently determine those centers are underperforming, they must have a measurement tool on which a variation of a single death is statistically significant at the margin (and with, in some cases an "n" in the single digits). And I'm guessing they don't. That seems to me a reasonable criticism. If I understand you correctly, you seem to be suggesting that the target goals were set as firm numbers - by reference to statistical predictions of acceptable outcome values - and it is now those hard targets themselves, and not some de novo statistical analysis of actual performance outcomes, which are the measurement standard. (So that if, say, a certain program was expected to have a volume of - picking arbitrary numbers - 20 patients per year, and the statistically predicted death rate at the minimum acceptable level would be 3 patients, then 3 becomes the hard and fast outcome limit for a given year. The program would then be in violation if it has 4 deaths, even if, say, the previous year it had only had 2, or if this year it had performed 30 procedures instead of 20.) The weakness of such a standard is obvious - it substitutes inflexible and idealized predictions for an actual analysis of performance data. From a public-policy perspective, this may actually be justifiable - it brings the benefits of clarity and ease of application - and I suppose I can accept that if in fact that is what they are doing. But adopting a bright-line rule that holds a specific level of performance to be the equivalent of incompetence is very far from saying that the claim of underperformance is true in fact. in the study above, mortality increased exponentially below the threshhold of 9/year. I don't have access to that study, but I presume that mortality did not increase at every center in the sample. I suspect that average mortality for centers with varying levels of activity increased as the level decreased - which is expected, but not the same as saying that it's inevitable. Also, donâ€t wait for a guarantee of bad performance That's another reasonable point. But, as above, saying that there are signs suggestive of bad performance is not the same as saying you have proven bad performance. The article discusses centers that have "failed to meet . . . minimum standards" or exhibit "glaring and repeated lapses" - both of which claims I think they fail to prove. If they had simply said "government moves proactively against centers with marginal performance", I wouldn't have an objection. â€œThere is no guarantee either that a death rate which is higher than the national average will also be higher than the maximal accepted standard.â€ True, but irrelevant. It appears that you introduced the higher-than-average idea. As far as I can tell, the article mentions only higher-than-predicted mortality, This is another good point, and again you have caught me in sloppy language, as well as a hasty assumption. I thought that the "expected" levels were the statistically predicted death rates for a given level of activity (and that this would represent an average rate for all centers with that level, hence my introduction of "average"). Thus, it seemed to me, the "expected" levels would fall in the middle of a bell curve of possible outcome numbers for the entire (possibly hypothetical) group of centers with the same level of activity - but that that figure would be too low to use as the performance cutoff. The maximal acceptable death rate should be somewhere to the right of the mean, because there would surely exist some level of performance a bit worse than "expected" that was still "good enough". But if the "expected" level is the maximal acceptable level, then exceeding that level is a problem. I just didn't think that the expected level would be set that close to, or at, the absolute limit. Thanks again for your input. You make some good points, and you are right that I perhaps was trying to impose an idealized standard on a problem with more practical dimensions. That's a good reminder to receive.

By: Dr. DeFACCto

Dr. DeFACCto — Thu, 13 Jul 2006 10:38:00 +0000

I’d like to respond to this in some detail:

1. Your first sentence has two flaws: surely you meant “higher-than-expected,” and that, to correctly apply to the entire 20% individually, the sentence would have to say something like “they show excess mortality or insufficient volume or both.” If the 20% of programs are taken collectively, the statement applies, but it leaves open to the reader the interpretation that each program has excess mortality, which is incorrect.

2. Your emphasis on statistical significance is, I think, misplaced. This is not research trying to differentiate two populations, but an application of minimum performance standards. (“Officer, what was the p-value on your radar gun readings?”) I presume that these benchmarks were derived from previous outcomes studies that were subjected to statistical analysis (see example below) and likely agreed-upon at concensus conferences representing the leading lights in transplantation medicine. Admittedly, the biggest voices will be those at the largest transplant centers and will have incentive to raise the bar on volume to protect their primacy, but I don’t think anyone can seriously argue that volume doesn’t matter in a field as complex as transplantation.

Oops, you did. “But only one of these standards is directly related to patient deaths (the patient survival rate standard, of course). …centers with low rates of a particular surgery are merely in violation of a training standard, which may or may not indicate low quality.” The next sentence is a direct contradiction: “…it is well documented that patient recovery and survival varies directly with the frequency with which the center performs a given surgery…” I agree with the latter statement, and here I have a representative cite: JAMA. 1994 Jun 15;271(23):1844-9, demonstrating program volume to be a strong predictor of survival in cardiac transplantation. Continuing, “But that figure is only an average, and failing to meet it does not guarantee that a center will be peforming badly on patient outcomes.” I’m not sure what “only an average” means here; in the study above, mortality increased exponentially below the threshhold of 9/year. Also, don’t wait for a guarantee of bad performance. You wouldn’t wait to reject a baby-sitter until you had a guarantee they were a child-molester. Quite the contrary, which is as it should be. The same with a transplant center; you’d want a guarantee of good performance.

3. “There is no guarantee either that a death rate which is higher than the national average will also be higher than the maximal accepted standard.” True, but irrelevant. It appears that you introduced the higher-than-average idea. As far as I can tell, the article mentions only higher-than-predicted mortality, which is based on a model including patient’s age, comorbidities, etc. When the article reports excess deaths, it seems to be with respect to this kind of model, to avoid punishing programs for taking higher-risk patients.

Next, I suspect, lacking firm evidence, that these measures were chosen as leading indicators, or tripwires. Surely nobody who transplants a heart, lung, or liver is interested only in one-year survival; five-year survival would be a better gauge of the overall success of the program. But that takes too long to measure. You wouldn’t want to wait for five-year survival data to go bad to shut down a program that was seriously underperforming in the short-term.

Just as one-year survival predicts survival down the road, program volume acts as an advance warning. Imagine a well-functioning program whose volume gradually slipped below the minimum floor. Initially, they should be able to maintain the standards achieved when they were busier, but over time will likely lose their edge. Note that in the study referenced above, it’s pretty clear that virtually none of the low-volume centers had ever had sufficient volume. Based on this I would agree with you to the extent that the presence of low volumes raises a concern but leaves more wiggle room than high mortality would. However, standards are standards, and for good reason, so i’ts entirely fair for the article to call a low-volume program substandard.

4. “In the end, then, they seem to have put their fingers on an important problem, but come up short on data that proves that problem is of serious extent.” I must differ here. Would you want your loved one to receive a heart at the two-a-year program with its 57% mortality, or really any of the underperforming programs?

“But we need much better data, much more careful analysis, and much clearer and more knowledgeable scientific reporting, to understand the situation carefully. Sadly, the LA Times has not acquitted itself well in this instance.” I suspect they made the most of what they had to work with from UNOS. You might want to look at their sources before making that conclusion. They’ve performed an important service in bringing to light a situation that the Feds have let go too long.

I apologize, I didn’t come here to cut you or your piece to ribbons. But, compared with your other writings, this one jumped out at me. Maybe Orac would have some useful comments, being an academic surgeon.