This time last week, Gavin Williamson was sticking to his guns. The algorithm had worked. There would be no U-turns.
Within 48 hours – hours that had been excruciatingly painful for tens of thousands of disappointed students, their parents, teachers and universities – Williamson had changed tack.
The algorithm hadn’t worked, he had ordered a U-turn – and was apologising to all the young people who had suffered as a result.
In politics, about-turns don’t come much bigger than this.
But if Williamson had thought his considerable embarrassment would end there, he was wrong – again.
His explanation that he had only become aware of the scale of the problems “over the Saturday and Sunday” is under intense scrutiny – it now appears both he and his department were warned at least four times in July and August that the algorithm-based system for calculating A-level and GCSE grades in England was seriously flawed.
So how did this blow up in his face?
The roots of the crisis date back to March, at the height of the pandemic. While countries such as Germany went ahead and held school exams despite the coronavirus, the nations of Britain did not, each deciding to award grades by algorithm.
On 31 March, Williamson hurriedly issued an instruction to the exam regulator, Ofqual, insisting “the distribution of grades follows a similar profile to that in previous years”. Few paid much attention. In the months that followed the main issue was how to reopen schools, with Williamson focusing on fighting with teachers’ unions over the issue.
But as the pandemic eased, the worries about the exam system began to mount. Early in July, the education select committee complained the approach meant there were “potential risks of bias” affecting disadvantaged groups.
That same month a worried former senior official, Sir Jon Coles, approached the schools minister, Nick Gibb.
Privately, Coles warned him “it would disadvantage particularly children from poorer backgrounds” – and would only be 75% accurate.
Huy Duong, a former medical statistician, with a son facing exam results of his own, calculated – presciently – that 39% of A* to D grades would be lower than teacher assessments.
The findings led to a front page story in the Guardian, six days before the England’s A-level results were due.
“It should not have taken a genius to work out that any model that was only 60% or 75% accurate would have led to too many individual injustices,” says Lord Kerslake, a former head of the civil service. “Was there any algorithm that could have done the job ministers were looking for?”
There were other warning signs, and they were flashing red.
Scotland was embroiled in its own exam storm. On 4 August, the Scottish Qualifications Authority downgraded nearly a quarter of all the recommended results – 124,000 – on the basis of its own algorithm. A deluge of complaints followed.
Poorer pupils had been hardest hit in Scotland, but journalists were reassured that England would be different and that the algorithm about to be unleashed on thousands of young people was better designed. It turned out not to be the case.
The Scottish model could not survive contact with political reality. The SNP education secretary, John Swinney, scrapped all the downgrades a week after they were announced, and allowed pupils to receive higher teacher assessed grades. “It’s deeply regrettable we got this wrong,” Swinney said.
The English A-level results were far more dramatic than anybody expected. While 39.1% of the 700,000 teacher assessments submitted were downgraded, it was Ofqual’s insistence on benchmarking results against the prior three years’ attainment in the same subject in that school that cause the most serious problems.
Yet, small classes, of the type favoured in private schools, were curiously exempt. That helped fee-paying schools record 4.7 percentage points more A or A* grades. For comprehensives the figure was 2 points, below the overall 2.3 average in England.
Downgrades for some students were dramatic – with schools reporting some pupils unexpectedly receiving a U.
One of the causes was that the algorithm decided that poor grades in the past meant somebody had to be marked down sharply again.
Protests were immediate and stories of upset and frustrated students dominated the media. On Monday, the day of the climbdown, a spirited group marched on the Department for Education with handmade placards, chanting “fuck the algorithm”.
Sam Freedman, a former adviser to Michael Gove at the DfE, said ministers had few good choices when the pandemic struck, but added: “Why couldn’t schools have seen the grades days or weeks in advance for verification, so they could say ‘this looks out of whack’?”
Instead, such were the suspicions about teacher assessment, schools were told the day before just as if tens of thousands of exam papers had been marked. There was no safety valve in the system to protect students, and even ministers.
As the crisis hit, Williamson became erratic. On the Saturday after the results were out, he told the Times “this is it”, there would be “No U-turn, no change”.
Protests grew, and a complete U-turn was announced on Monday afternoon, where teacher assessments could replace algorithm-based grades. “Over the weekend it became apparent,” Williamson belatedly said “there were real concerns about what … [grades] a large number of students were getting”.
It was accompanied by a picture of Williamson, a former chief whip, sitting at his desk with a riding whip and a red book at the front. “It was there when I arrived, so I made sure it got in the picture,” the photographer, Stefan Rousseau, said.
The unsubtle message was not lost on anybody, coming from a figure who was once Theresa May’s key enforcer before she sacked him and he switched to help run Boris Johnson’s leadership campaign.
The current prime minister, on holiday in Scotland, has already shown himself to be a reluctant sacker, and Williamson appears insulated for now. But few in Westminster believe the education secretary can survive much longer if there are problems with schools reopening in September.
Meanwhile, for a government that won dozens of working class seats from Labour at the election, it was politically damaging. A former Tory aide who worked in government before the election said: “Why have Dominic Cummings if he or people close to him don’t raise the alarm in situations like this?”
Duong too is sure Williamson should have stopped the process in its tracks. “If a software failure causes a train crash, the boss can’t say, ‘I only saw the lines of code after the crash’. He should have known that in testing it failed 25% of the time and scrapped that software.”