Teaching Philosophy in Primary Schools

July 14, 2015

tags: Education Endowment Foundation, maths, philosophy, primary schools, randomised trial, reading, SAPERE, teaching

A misleading randomised trial

Last week’s press was full of the news that teaching philosophy to primary school kids helped with their maths and reading. The BBC led with, “Philosophy sessions ‘boost primary school results'” (here). The Mirror, “Want to boost your child’s maths and reading skills? Then teach them about trust and kindness” (here), and specialists implied they’d known it all along, “It stands to reason that philosophy benefits learning” Times Education Supplement (here). Only the Guardian, put even a coded doubt into the final two words of its headline,”Philosophical discussions boost pupils’ maths and literacy progress, study finds” (here).

Sceptics need to take this seriously because, unusually for an education intervention, the evidence comes from a randomised controlled trial (RCT). In this case a trial of “Philosophy for Children” (P4C) a programme of weekly lessons. An outfit called the Society for the Advancement of Philosophical Enquiry and Reflection in Education (SAPERE) charges £4,000 to train and support teachers to deliver P4C in an average school for a year and claims to have signed up 600 schools in the UK alone. If they could persuade all 17,000 primary schools in the UK to join, they’d make a killing. So let’s take a critical look.

The trial report is here, or for those with access problems Philosophy_for_Children report. It has not been published in a peer reviewed journal, although there are apparently plans to do so. It was peer reviewed by the organisation that commissioned the study, the Education Endowment Foundation, who have judged that the findings of an improvement in reading and maths to have a moderate degree of security. The RCT was not registered on a trials database, but the protocol was publicly available here, or for those with access problems Philosophy_for_Children protocol.

Whole schools were randomised, a risky “cluster” design if individual inclusion can be altered by knowledge of the allocation, but one well-suited to education interventions; parents or teachers are unlikely to move children just because they are getting, or not getting, a weekly philosophy class, and all pupils do the outcome tests anyway. The 26 intervention schools got P4C implemented for 9 and 10 year olds for a period of somewhere between one and two years. The 22 control schools got “business as usual”; it’s not clear whether they gave an alternative lesson or sent the kids to the playground. Control schools got P4C at the end of the trial period so any effects could only be measured up to that time point. 3,159 children were included, and the groups were well balanced at baseline.

The protocol listed two primary outcomes, namely the overall Key Stage 2 (KS2) and overall Cognitive Abilities Test (CAT4) scores at the end of the trial, both reported as means and standard deviations (SD). A high score is good. There were seven planned secondary outcomes including the three components of KS2 (reading, writing and maths), and the four components of the CAT (verbal, non-verbal, quantitative and spatial ability). The plan was to also do subgroup analyses by year group and by whether the pupil was eligible for free school meals or not.

The results are in tables 5 to 19 of the main report.The CAT scores were reported for 2,821 (89%) of the enrolled pupils but KS2 scores were only available for 1,529 (48%). For some reason the overall KS2, one of the primary outcomes, was not reported at all! The seven secondary subscale outcomes are all reported, although not the subgroup analysis by year group. The subgroup by eligibility for free school meals was only reported for the eligible pupils.

There aren’t actually any differences between the groups in the scores reported. Nearly all slightly favour the control group, but the differences are tiny fractions of a standard deviation. No significance tests are reported, but I guess if they had been done, and corrected for multiple testing, they would all have been non significant (i.e. P>0.05). A negative trial.

But then the researchers got to work. They noticed that by chance the scores were slightly worse in the treatment group at study entry, so they decided to compare the change in score rather than the absolute scores. This manoevre was pre-specified in the protocol for the CAT score, but not for KS2. The authors openly admit that it was data driven.

“By the end the treatment group had narrowed this gap in all three subjects, especially for KS2 scores in reading and maths. For this reason, the key stage results are all presented as gain scores representing progress from KS1 to KS2.”

Unsurprisingly (because random variation tends to regress to the mean*) the results now favoured the intervention group. But it’s still a tiny effect.

Table 5 KS1-2 Reading. The difference in change scores = 0.11 (SD 1.0)
Table 6 KS1-2 Writing. The difference in change scores = 0.03 (SD 1.0)
Table 7 KS1-2 Maths. The difference in change scores = 0.08 (SD 1.0)

Still no tests of statistical significance (the lead author, Stephen Gorard, has some sort of principled objection to them) but their lack does not stop him concluding: “The results in tables 5 and 7 are unlikely to be due to chance.” On this basis the report’s first Key Conclusion, the primary finding of the trial, states:

“There is evidence that P4C had a positive impact on Key Stage 2 attainment. Overall, pupils using the approach made approximately two additional months’ progress on reading and maths.”

This sentence, plastered all over the Education Endowment Foundation website and press releases led predictably to the breathless headlines.

But it’s wrong. The triallists pre-specified two primary outcomes but only reported one, which showed no difference. They pre-specified seven secondary outcomes which showed no differences either. However when they altered their analysis plan after seeing the data they noticed that two of the secondary outcomes showed a tiny shift in mean change scores favouring the intervention. The effect size was about 10% of a standard deviation, and less than half the participants had the relevant scores measured, but who cares! Without any tests of statistical significance they declared that it was unlikely to have occurred by chance!

In an email to me Stephen Gorard wrote that he had no axe to grind. His research group Research Design & Evaluation (click here) had nothing to do with SAPERE or P4C; they had just been commissioned to evaluate the programme. He likened RD&E to a “taxi for hire”. Indeed so. Taxis get you where you want to go. RD&E gets you the results you want.

Jim Thornton

* Matthew Inglis @mjinglis (click here) makes the same point more elegantly with this graph of the three scores before and after the intervention.

24 Comments leave one →

teachingbattleground permalink

July 14, 2015 7:42 am

Reblogged this on The Echo Chamber.

Reply
ollieorange2 permalink

July 14, 2015 9:19 am

It’s not just Stephen Gorard, all of the people who use the Effect Size think that statistical significance testing is wrong.

Reply
nilsa permalink

July 16, 2015 3:39 am

I concur with the concern addressed by Jim Thornton. The study does not seem to provide any empirical evidence of interventional effect (firstly lack of analysis plan and secondary even lack of appropriate post-hoc analysis). In addition, did they even get the calc of SD right? Well, hard to say because no details are provided. However, from their results it looks like they did not account for the cluster design and the dependency structure.

Reply

Freemarket environmentalism and more

Teaching Philosophy in Primary Schools

A misleading randomised trial

Trackbacks

Leave a comment Cancel reply

Recent Posts

Email Subscription

Blog Stats

Jim’s recent tweets

Translate

Most viewed in last 48 hours

Search Ripe-tomato.org

Categories

Old tomatoes

Freemarket environmentalism and more

Teaching Philosophy in Primary Schools

A misleading randomised trial

Share this:

Related

Trackbacks

Leave a comment Cancel reply

Recent Posts

Email Subscription

Blog Stats

Jim’s recent tweets

Translate

Most viewed in last 48 hours

Search Ripe-tomato.org

Categories

Old tomatoes