
Opponents of pen-and-paper standardised exams have been making their case for decades. Still, standardised assessment has proven to be the fairest and most efficient way to assess students at scale.
The promise of AI has been accompanied by new advocates for systems of adaptive testing to fully personalised exams. Meanwhile the rapid uptake of AI among students is destroying trust in take-home assessments and homework as a means of measuring student progress.
Schools have responded by seeking methods of securely, fairly and accurately measuring student achievement while preserving efficiency, noting that curating, marking and reporting remain time-intensive and expensive.
Over the next few months, we're on a mission to share insights about the modern exam and where it comes from. Most of the features of the modern exam evolved to solve well understand problems. It evolved in stages, each responding to scale, fairness, security, or efficiency.
Still, in most of the world, the notion of a standardised exam is historically very new, and has been continually evolving over this time. Understanding that evolution helps us see the current digital transition in context, and perhaps to understand what’s coming next.
The earliest ancestors of the modern standardised exam emerged in Imperial China more than 1,400 years ago. These civil service examinations introduced many features that still shape assessment today: centrally set papers, anonymous marking, strict security, tiered progression, and advancement based on performance rather than patronage. The need was clear: to filter very large numbers of candidates fairly and consistently across a vast system.
That is where much of the basic architecture of modern assessment began. Standardisation, anonymity, and scalability were not incidental features. They were the point.
In medieval Europe, standardised written exams had not yet become the norm. Universities instead relied on oral disputations, with students publicly defending propositions while masters challenged and tested their reasoning. These assessments could last hours or even days.
They were intellectually rigorous, but they were also difficult to scale. As universities grew, oral testing became increasingly impractical, especially in subjects such as mathematics, where written working was essential to show complex, multi-step reasoning. As student numbers rose and administration became more demanding, written scripts allowed centralised marking and greater consistency. The silent exam hall gradually replaced the public debate.
By the mid-19th century, Britain had formalised the adoption of written competitive examinations for the civil service through the Northcote–Trevelyan reforms. The goal was to curb patronage and establish meritocracy. Written, anonymous, standardised exams became a central mechanism for doing so.
From there, the model spread. Cambridge and Oxford local examinations extended across Britain and then across the empire, and Australia inherited much of this structure through colonial education systems. By the late 19th century, the architecture of Australian schooling and public exams closely resembled Britain’s: timed written papers, centralised marking, and competitive ranking. Over time, the standardised exam became deeply embedded in schooling culture.
Another major structural change arrived in the early 20th century through the United States. As enrolments expanded, efficiency became more important. Frederick J. Kelly developed the multiple-choice format in 1914, and during World War I the Army Alpha tests applied it at massive scale.
The appeal was obvious: objective scoring, rapid marking, statistical comparability, and reduced examiner variability. Optical mark recognition systems followed, making it possible to process millions of scripts. Multiple choice did not replace essays, but it introduced a new lens: exams as data systems. Psychometrics, reliability analysis, and large-scale standardisation became increasingly central to assessment design.
For much of the 20th century, calculators were banned from exam rooms. Slide rules and log tables were permitted, but electronic devices were not. When calculators were first introduced in the 1970s, there was concern they would undermine mathematical understanding.
Over time, that view shifted. As calculators became normal in the workplace and in broader life, exam boards adapted. The focus moved from arithmetic execution to problem-solving and reasoning. Technology once seen as a threat was gradually absorbed into the design of assessment itself.
That pattern is worth remembering. Assessment formats often resist technological change at first, then later reorganise around it.
Even at its peak, the traditional written exam was always a compromise. It worked relatively well for some disciplines, but less well for others.
Language exams often required listening and speaking. Music and drama depended on performance. Vocational training relied on practical demonstration of competency. Art required portfolios and execution. More recently, software engineering has increasingly required executable solutions rather than inefficient handwritten pseudocode.
The one-size-fits-all handwritten script was never a perfect fit across all subjects. It was simply the most workable format available at scale for a long period of time.
Today, the evolution of exams is continuing through the move from pen-and-paper to digital delivery. According to the carousel, more than 100 jurisdictions are making this transition. This is not just a change in medium. It reflects a new set of constraints and opportunities.
Digital assessment can better support multimedia, new answer types, listening components, automated scoring for some formats, faster turnaround of results, data-driven moderation, security monitoring, and closer alignment with digital modes of learning.
In other words, the move to digital is not a break from the history of exams. It is the latest chapter in it.
Each major structural shift in assessment has followed a pressure point. Oral gave way to written because systems needed scale. Essays were joined by multiple choice because systems needed efficiency and comparability. Calculators were integrated because technology had become normalised. Digital is emerging now because the demands placed on assessment have changed once again.
Exams today still carry the structural DNA of Imperial China, 19th-century Britain, and early 20th-century American psychometrics. They were built to solve specific problems, and many of those problems remain important. Fairness still matters. Security still matters. Efficiency still matters.
But if the underlying conditions change, the format must eventually change too.
Modern exams look the way they do because they were designed for the needs of another era. The question now is not whether assessment will change again. It is whether systems will shape that change deliberately, or be forced into it later. Access the full report here.
