Crafty sample size change between registry and publication
The Warwick Arthroplasty Trial
Good to see orthopaedic surgeons doing randomised trials, but sad to report how quickly they are learning to fudge their trial registry problems.
The Warwick Arthroplasty Trial, comparing hip replacement with resurfacing in young patients, was registered here in 2007, and published in the BMJ here this week. The authors claim no difference in outcomes at one year.
But despite a planned sample size of 172, the published version included only 126 (60 resurfacing; 66 total). There’s no explanation for the difference in the BMJ paper but the full protocol published in 2010 here, reads:
“With an allowance for 10% drop-out, the total number of patients required will be 172. If recruitment proves to be problematic during the course of the trial, then with the agreement of the trial steering committee the target will be lowered and the more usual 80% power level will be considered sufficient. For this scenario, the total number of patients required will be 120 (including 10% for drop-out).”
In principle OK, except that this protocol was published in Jan 2010, the same month the final participant was recruited! I know I’m a suspicious blighter, but had they already decided to stop early and were pretending it was a planned early finish? There is no mention of any of this in the BMJ paper.
Some might argue that since the trial was negative, at least they weren’t stopping because they had peeked at the data and found a positive result. But it does matter, because they are claiming equivalence in the short term, and it’s obvious that many surgeons believe nothing of the sort.
The BMJ correspondence also suggests that they used the wrong endpoints. Apparently both the Harris and Oxford scores have a ceiling effect. You get top marks if you can walk around a shopping mall, and no extra if you can climb Everest or sail solo round Cape Horn! I’ve no idea if this is true, but I do know that if you design a trial to have a reasonable chance of ruling out the minimum worthwhile treatment effect, and give up part way having detected no difference, you have not shown equivalence.
There’s no reason why orthopaedic surgeons should appreciate the importance of all this, but the Warwick Clinical Trials Unit, fully registered with the NIHR should know better.
Funding – Research for Patient Benefit scheme of the National Institute of Health Research. i.e. the government.
Jim Thornton
of course sample size calculations are important before setting off on a research project as otherwise much effort will be wastedbut they have become the holy grail and nothing can be started before a statistician (bit like getting a sick certificate) has signed saying it is pucker.
However you enter the world of Kuhn (a bad thing) rather thasn the world of Popper (a good thing) as you can not challenge the paradigm and all you are doing is fullfilling a self fulfilling prophesy. Put simply you say, this is what I think will happen, oh look it has happened or oh dear it didn’t. Much better to try and say this is a type 2 error and just get on trying new things. Anyway you are only interested if there is a real clinical difference, not a small statistical difference. Jeremy
Oh no! This is nothing to do with arguments between Popperian falsification or Kuhnian paradigms. It’s much simpler than that.
If you say you’re going to randomise 100 participants per group, count the dead bodies and compare the rates with a simple Chi square test, then your P value will truly represent the probability that the result you observed, occurred by chance.
If you run a Chi square test after every 10 participants and stop if P < 0.05, or alter your primary outcome to "dead or rather sick", or worse do both, your calculated P value will be wrong.
It's as simple as that. It matters because we're talking about choosing the best treatment for real sick people. And it's time it stopped.