By CK Williams
Published in 2005, On the Metro is one of CK Williams’ later poems. The subject is risky – an old man admiring a younger girl – but look at the care he takes, with each long line, to make his meaning clear. I think he succeeded. He died on Sept 20th.
On the metro, I have to ask a young woman to move the packages
……..beside her to make room for me;
she’s reading, her foot propped on the seat in front of her, and barely
……..looks up as she pulls them to her.
I sit, take out my own book—Cioran, The Temptation to Exist—and
……..notice her glancing up from hers
to take in the title of mine, and then, as Gombrowicz puts it, she
……..“affirms herself physically,” that is,
becomes present in a way she hadn’t been before: though she hasn’t
……..moved, she’s allowed herself
to come more sharply into focus, be more accessible to my sensual
……..perception, so I can’t help but remark
her strong figure and very tan skin—(how literally golden young
……..women can look at the end of summer.)
She leans back now, and as the train rocks and her arm brushes mine
……..she doesn’t pull it away;
she seems to be allowing our surfaces to unite: the fine hairs on both
……..our forearms, sensitive, alive,
achingly alive, bring news of someone touched, someone sensed, and
……..thus acknowledged, known.
I understand that in no way is she offering more than this, and in truth
……..I have no desire for more,
but it’s still enough for me to be taken by a surge, first of warmth then
……..of something like its opposite:
a memory—a girl I’d mooned for from afar, across the table from me
……..in the library in school now,
our feet I thought touching, touching even again, and then, with all I
……..craved that touch to mean,
my having to realize it wasn’t her flesh my flesh for that gleaming time
……..had pressed, but a table leg.
The young woman today removes her arm now, stands, swaying
……..against the lurch of the slowing train,
and crossing before me brushes my knee and does that thing again,
……..asserts her bodily being again,
(Gombrowicz again), then quickly moves to the door of the car and
……..descends, not once looking back,
(to my relief not looking back), and I allow myself the thought that
……..though I must be to her again
as senseless as that table of my youth, as wooden, as unfeeling, perhaps
……..there was a moment I was not.
Another wrong (about randomisation) educationalist
Jo Boaler, author, Stanford professor (click here), and founder of the educational website YouCubed (click here) is visiting the UK to persuade schools to take up her maths teaching ideas (click here). She objects to rote learning of times tables. According to the Times Education Supplement (click here) she has said:
“Governments saying everybody has to memorise their times tables to 12 times 12 is absolutely disastrous.”
Blimey! But she has her reasons. She believes forcing weak children to learn tables makes them anxious about maths in general.
“What we know now is that when you give things to kids like a timed multiplication test, about a third of them develop anxiety. For those kids the working memory which holds maths facts is blocked and they can’t access it.”
“Some kids aren’t fast memorisers,and they decide from an early age that they can’t do maths because of the timed maths tests.”
Even bright kids are harmed:
“Other kids may be OK but see maths as a shallow subject which is about recall of facts and disengage. So [tables cause] huge damage”.
It all sounds plausible, her webpages cite whole libraries of academic papers, and she is obviously a charismatic educationalist. Judging by her twitter feed @joboaler many teachers adore her.
But others argue that learning tables is a vital early step in getting comfortable with mathematics. Click here for one. They also have theories and academic papers in support.
The research cited by each side is impenetrable to anyone uncommitted to the argument the author is advocating, and I’m certainly not qualified to judge it.
But I am qualified to say that in an area like maths teaching where factors like innate ability, teacher enthusiasm and parental engagement almost certainly influence results, the only reliable way to judge who is right, is a randomised controlled trial. But Jo Boaler cites none, and Google can’t find any.
I am also qualified to state that Jo Boaler doesn’t understand the limitations of observational data and the need for randomised trials in education. In a paper (click here) critiquing a US National Mathematics Advisory Panel that had advocated teaching tables in 2008, she wrote,:
“When comparing teaching approaches to consider which is more effective, random or equal assignment may be thought of as presenting a research ideal. If students are assigned to random or equal groups and given different treatments, and one treatment results in better outcomes, then researchers have a strong case for making causal statements. Experiments such as these have emanated from medical research, and they lend themselves to the controlled conditions of laboratories. However, when researching learning in complicated places such as schools, such models become highly impractical and, some would say, implausible.”
“Researchers in mathematics education do not need to assign students to groups in quasi-experimental studies, taking control of their education, as they can employ statistical methods to control for differences in student characteristics. Using logistic regression analysis, for example, researchers can control for factors such as prior mathematics achievement, gender, and socioeconomic status. It could be argued that researchers cannot control for every variable that may affect a student in a population, but they can control for all those known to be reasonable […]”
That’s wrong Jo Boaler. Other educationalists have expressed similar sentiments, and they are wrong too. Not just a bit wrong, but absolutely 100% wrong. The exact opposite of correct. Score gamma triple minus in the “education intervention evaluation exam”. Go to the back of the class Jo Boaler.
It is the very complexity of education, the many unknown factors that influence outcomes, that justifies randomisation. “Prior mathematics achievement, gender, and socioeconomic status” aren’t the problem. Jo Boaler is right about that; we can measure them and, at least in principle, control for them using logistic regression analysis. But, by definition, no amount of fancy statistics can ever control for unknown factors; they are unknown factors. The only way to have any assurance that they are more or less equal between the two groups under study is to allocate the students at random.
This misunderstanding about randomisation causes trouble in two ways. If Jo Boaler is wrong, her ideas are condemning thousands, maybe millions, of children to not learn their tables, and a lifetime of innumeracy.
But what if she is right? In that case her failure to test her ideas against well-conducted randomised trials allows governments all over the world to continue forcing children to learn their tables by rote and condemn even more to a lifetime of fear of mathematics.
What a pity no-one taught Jo Boaler how to evaluate educational interventions properly.
What happened next
Luke’s version is a story of redemption. A wastrel lose his inheritance “with riotous living”, admits his error, “I […] am no more worthy to be called thy son”, but his father forgives him, kills the fatted calf, and tells his resentful older brother to rejoice “for this my son was dead, and is alive again; he was lost, and is found”.
In Graham Greene’s Monsignor Quixote* the communist Mayor recounts the parable before dinner. In his version the son from a bourgeois family objects to inherited wealth and, in a Tolstoyan gesture of solidarity with the poor, gives his share away and lives as a peasant until, his courage failing, he returns to his father for forgiveness. But then he is disgusted for a second time, pines for the hard earth floor and broods on the saying of a wise peasant – Lenin’s words – that capitalism is a machine invented by capitalists to keep the working class in subjection. As the travellers enter Botin’s restaurant, the Mayor calls for suckling pig and a bottle of the Marques de Murrietta’s red wine, and Greene exposes his hypocrisy.
“I am surprised that you favour the aristocracy” says Monsignor Quixote, referring to the wine.
The mayor splutters the conventional communist excuses, and the priest admits he only eats horse steaks at home.
Rudyard Kipling also sends the prodigal son back to poverty for a second time. In this case to escape his stifling family, and especially his sanctimonious elder brother.
The Prodigal Son
Here come I to my own again,
Fed, forgiven and known again,
Claimed by bone of my bone again
And cheered by flesh of my flesh.
The fatted calf is dressed for me,
But the husks have greater zest for me,
I think my pigs will be best for me,
So I’m off to the Yards afresh.
I never was very refined, you see,
(And it weighs on my brother’s mind, you see)
But there’s no reproach among swine, d’you see,
For being a bit of a swine.
So I’m off with wallet and staff to eat
The bread that is three parts chaff to wheat,
But glory be! – there’s a laugh to it,
Which isn’t the case when we dine.
My father glooms and advises me,
My brother sulks and despises me,
And Mother catechises me
Till I want to go out and swear.
And, in spite of the butler’s gravity,
I know that the servants have it I
Am a monster of moral depravity,
And I’m damned if I think it’s fair!
I wasted my substance, I know I did,
On riotous living, so I did,
But there’s nothing on record to show I did
Worse than my betters have done.
They talk of the money I spent out there –
They hint at the pace that I went out there –
But they all forget I was sent out there
Alone as a rich man’s son.
So I was a mark for plunder at once,
And lost my cash (can you wonder?) at once,
But I didn’t give up and knock under at once,
I worked in the Yards, for a spell,
Where I spent my nights and my days with hogs.
And shared their milk and maize with hogs,
Till, I guess, I have learned what pays with hogs
And – I have that knowledge to sell!
So back I go to my job again,
Not so easy to rob again,
Or quite so ready to sob again
On any neck that’s around.
I’m leaving, Pater. Good-bye to you!
God bless you, Mater! I’ll write to you!
I wouldn’t be impolite to you,
But, Brother, you are a hound!
Greene made a political point, but Kipling got to the heart of the story. It’s not sufficient to kill the fatted calf. We need to shed our smug piety at the sinner’s misfortune.
* Graham Greene. Monsignor Quixote. The Bodley Head, London. 1982. p 38.
The gardens and park at Lanhydrock (click here) are superb. But the man on the gate struggled to remember any famous associations with the house itself; Gladstone planted a tree and there were mutterings about Poldark, but that was it. So I skipped the house and, late in the day, discovered this.
A tiny stream flows though and the water looks clean, albeit dark and weedy. I wasn’t concerned by the sign, rather the reverse, but I had no swimmers, young people were around, and while I contemplated it started to rain, and my courage failed me. I’m kicking myself, but it’s there for another day.
Meeting Point by Louis MacNeice and Wincher’s Stance by John Clinch
In 1938 MacNeice, who was still on good terms with his ex-wife Mary, and whose affair with Nancy, the illustrator of I Crossed the Minch, was drawing to a close, met the writer and political activist Eleanor Clark on a US lecture tour. The following year he engineered a job at Cornell university to be with her.
I couldn’t resist juxtaposing the poem of their meeting, presumably at New York docks, with the late John Clinch’s sculpture, Wincher’s Stance, at Glasgow bus station. Both are accessible and deservedly popular.
Time was away and somewhere else,
There were two glasses and two chairs
And two people with the one pulse
(Somebody stopped the moving stairs)
Time was away and somewhere else.
And they were neither up nor down;
The stream’s music did not stop
Flowing through heather, limpid brown,
Although they sat in a coffee shop
And they were neither up nor down.
The bell was silent in the air
Holding its inverted poise –
Between the clang and clang a flower,
A brazen calyx of no noise:
The bell was silent in the air.
The camels crossed the miles of sand
That stretched around the cups and plates;
The desert was their own, they planned
To portion out the stars and dates:
The camels crossed the miles of sand.
Time was away and somewhere else.
The waiter did not come, the clock
Forgot them and the radio waltz
Came out like water from a rock:
Time was away and somewhere else.
Her fingers flicked away the ash
That bloomed again in tropic trees:
Not caring if the markets crash
When they had forests such as these,
Her fingers flicked away the ash.
God or whatever means the Good
Be praised that time can stop like this,
That what the heart has understood
Can verify in the body’s peace
God or whatever means the Good.
Time was away and she was here
And life no longer what it was,
The bell was silent in the air
And all the room one glow because
Time was away and she was here.
— Louis MacNeice
Chapel cliff natural swimming pool
The picturesque Cornish fishing village is no place for swimmers.
Only the brave would get in among the boats, and the small NE facing beach just outside the harbour is in shade from midday onwards. But take the SW coast path a few hundred yards to Chapel cliff, where at low tide steps lead down to a natural sea-swimming pool.
Luke and I went at high tide. The pool was flooded, and there was a slight swell. Access directly off the rocks would have been tricky, but the steps made it easy. A great swimming spot.
The P4C trial security rating
Last week I criticised a trial, which the authors claimed had shown that a programme of philosophy teaching (P4C) in primary schools improved pupils literacy and maths (click here). The organisation which ran the trial, a semi-independent largely government-funded charity, the Education Endowment Foundation (EEF) has now defended their work (click here), without responding to any of the substantive issues, namely imbalance at baseline, negative primary results, >50% attrition on one primary endpoint and inappropriate cherry picking among data-driven secondary endpoints. Instead they defend a side issue, the lead researcher Stephen Gorard’s failure to report statistical significance. They also insist that the trial was evaluated independently using the EEF’s padlock rating scheme. With three padlocks out of five awarded by the EEF evaluators, that scheme is supposed to indicate that the results have a moderate degree of security.
The padlock scheme is described here. The criteria, as described by the EEF, are as follows.
1. Design: The quality of the design used to create a comparison group of pupils with which to determine an unbiased measure of the impact on attainment.
2. Power: The minimum detectable effect that the trial was powered to achieve at randomisation, which is heavily influenced by sample size. Implementation (thresholds that could be over-ridden by criteria 4, ‘Balance’, in exceptional circumstances)
3. Attrition: The level of overall drop-out from the evaluation treatment and control groups, which could potentially bias the findings. Analysis and interpretation (judgement required)
4. Balance: The final amount of balance achieved at the baseline on observable characteristics in the primary analysis.
5. Threats to validity: How well-defined and consistently delivered the intervention was, and whether the findings could be explained by anything other than the intervention.
The final padlock rating is derived by rating design, power and attrition, taking the lowest, and adjusting it up or down according to the presence or absence of balance at baseline and other threats to validity. The final rating cannot be higher than the lowest rating for design or power.
The EEF evaluator’s rating is given in appendix 2 of the main P4C trial report available here. I’ve reproduced the relevant table here:
Their justification was as follows:
“This evaluation was designed as a randomised controlled trial. The sample size was designed to detect a MDES of less than 0.4, by design, reducing the security rating to 3 . At the unit of
randomisation (school), there was zero attrition, and extremely low attrition at the pupil level also. The post-tests were administered by the schools by teachers who were aware of the treatment allocation, but with invigilation from the independent evaluators. Balance at baseline was high, and there were no substantial threats to validity.”
Let’s review these judgments.
A cluster randomized trial with 48 schools and 3,159 pupils. Five padlocks is correct.
I don’t know how to judge this without significance testing. But it’s a pretty large trial, albeit one which will lose some power from the cluster design. The EEF evaluators rated it three padlocks, which seems reasonable.
The EEF evaluators judged attrition at the cluster level only, which is correct according to the EEF guide, because that was the level at which randomisation occurred. All randomised schools were followed up, so they rated 5 padlocks on this criterion.
The evaluators noted that pupil level attrition was very low, which suggests they somehow missed the 52% attrition on KS2 score. But unless they had reduced the attrition rating to less than three padlocks this would not alter the final rating. The reason is that at this point the evaluator is supposed to allocate an interim padlock rating based on the lowest of the above three marks. The EEF guide reads:
“At this point the overall security rating for the evaluation will be determined by the minimum rating across the above three criteria. The minimum of first two criteria (planned design and power) determine the maximum security rating for the evaluation.”
So, however we interpret attrition, the overall security rating at this stage is three padlocks.
The EEF guide then says that this interim rating should be adjusted up or down depending on the final two criteria, balance and other threats to validity.
The evaluators judged that the groups were well balanced at baseline, on the basis of table 4 (baseline characteristics) in the report. But they ignored, or did not notice, the baseline imbalance of nearly 0.2 SD in KS1 reading and mathematics (total KS1 baseline scores are not reported) and of between 0.05 and 0.1 SD in CAT score. We can forgive the evaluators because these imbalances are not reported in table 4. They appear in the third columns of tables 5, 7 and 11 in the main report.
Since the Key Stage (KS) and Cognitive Ability Test (CAT) scores are the trial’s two primary outcomes, the evaluators made a mistake here, albeit a forgiveable one. The EEF guide (Table 3 below) suggests that in the presence of a baseline difference of >0.1 on a key characteristic the evaluators should drop two padlocks.
Three minus two = one. The interim rating should now be one padlock.
Other threats to validity
The EEF guide lists six other potential threats, namely 1. Insufficient description of the intervention, 2. Diffusion (or ‘contamination’), 3. Compensation rivalry or resentful demoralisation, 4. Evaluator or developer bias, 5. Testing bias, and 6. Selection bias. It suggests that adjustment should be made as follows:
“If any of the above issues are identified as a cause for concern some judgement should be used in adjusting the security rating to account for any issues identified. The following are some suggested rules:
- If there is evidence of any one or two threats the rating should drop 1 padlock
- If there is evidence of more than two threats the rating should drop 2 padlocks”
The EEF evaluators did not detect any threats. But in my opinion there are two unambiguous ones. No 1 because it was not made clear what teaching the control pupils got during the P4C lessons, and no 6 because the outcomes on which the conclusions were based were change scores selected post hoc, and susceptible to regression to the mean. A critical reviewer might also argue that the selective choice of outcomes suggests evaluator bias (threat 4). But this seems a bit circular so I’m giving them the benefit of the doubt on that.
Even if the teaching that control pupils got was recorded somewhere else, the problem of selecting change scores post hoc, and their susceptibilty to regression to the mean, is a definite threat to validity. So at best the final rating should drop by a further padlock. One minus one = zero. A final padlock rating of zero out of five.
According to the EEF zero padlocks mean the P4C trial “adds little or nothing to the evidence base”.
I’d be delighted to learn if I’ve made a mistake in the above. If not the EEF may wish to look for new evaluators.