Skip to content
 

Blog post

Brokering experimental knowledge for educational policy and practice

Adrian Simpson, Professor at Durham University

In recent years, governments have moved social policy towards being ‘evidence-based’. To give just two examples, in the US the Every Student Succeeds Act (ESSA) restricts funding to ‘evidence-based’ programmes and defines the strongest evidence as experimental studies; while in the UK, the government-funded Education Endowment Foundation (EEF) aims to improve teaching and learning through the use of evidence, where the strongest evidence is interpreted as randomised controlled trials (RCTs) – the latter prompting Innes (2024) to argue that the EEF functions ‘as a gatekeeper, deciding whose knowledge counts’ (p. 12).

This blog post explores the ‘knowledge which counts’, which appears to be that brokered from two distinct forms of RCT underpinning ‘evidence-based education’: field and laboratory experiments. In a recent article published in the British Educational Research Journal, I and a colleague contend that neither forms a strong basis for policy or practice (see Rowlandson & Simpson, 2024). Ideal RCTs provide strong evidence of causes within the studies: a sufficiently large difference in outcome, between groups randomly assigned to different treatments, is strong evidence of treatment allocation playing a causal role for at least some participants.

‘Teachers need to rely on knowledge brokers to translate studies into policy and practice recommendations.’

Teachers need to rely on knowledge brokers to translate studies into policy and practice recommendations. For example, one of EEF’s guidance reports cites their field experiment of 1stClass@Number – an intervention providing intensive support for pupils struggling with maths – as evidence for recommending ‘high quality, structured teaching assistant support’ (p. 32). Meanwhile, in 2015 the influential Deans for Impact report cited laboratory experiments to recommend that teachers encourage ‘students to identify and label the substeps required for solving a problem’ (Deans for Impact, 2015 , p. 4) (see also for example Catrambone, 1996).

Field experiments

Field experiments evaluate interventions in realistic contexts. Compared to the control, the intervention group in the 1stClass@Number evaluation had additional lessons, which were highly scripted, delivered by trained teaching assistants (TAs) and designed to be fun. While RCTs provide strong evidence of a cause at play in the study, knowledge brokers aim to make claims about causes which might be relied upon for practice elsewhere. However, knowledge brokers often transport causal claims by naïve induction: it happened there (in the study) so it should happen here (in your classroom).

This is highly problematic for many reasons (Joyce & Cartwright, 2020). For one, a causal role can only be ascribed to the whole ‘blob of causes’. In 1stClass@Number, the difference did include high-quality teaching assistants, but also additional teaching, tight lesson scripting and lessons designed to be fun. That is, despite the EEF’s guidance report using 1stClass@Number to exemplify the value of high-quality, structured TAs, it is possible that TAs could have been irrelevant. The effect might have been at least as high had the intervention been delivered by teachers, parents, older pupils or any other method: nothing in the RCT’s logic ascribes a particular causal role to TAs.

Laboratory experiments

Laboratory experiments are less obviously prone to ‘blob of causes’ problems. In one experiment, Catrambone’s participants (inevitably psychology undergraduates) learned to solve unfamiliar mathematics problems, and were randomly assigned to have subgoals in the procedure labelled or not. Here, it is not cherry picking from a blob of causes to claim labelled subgoals played a positive causal role for at least some participants. Nonetheless, we argue causes identified in laboratory experiments don’t transport to policy any more easily (Rowlandson & Simpson 2024).

The Deans for Impact transport Catrambone’s work directly to practice recommendations via naïve induction – and also get it wrong! Nothing in the experiments involved participants labelling (instead, labels were provided) and subgoals (groups of steps which together form a procedural chunk), rather than individual steps, were labelled. However, more accurate naïve policy induction (‘subgoals should be labelled’) is also poorly justified directly from Catrambone’s experiments. Experimental results need qualifying clauses: for instance, ‘in the right circumstances, labelling subgoals results in improved procedural performance’. Sequences of laboratory experiments may identify some circumstances, but one qualification cannot be removed: ‘provided no competing causes are present’. The clinical nature of laboratories helps screen off competing causes, but classrooms are riddled with the noisiness of myriad potentially competing causal features.

Conclusion

I am not claiming that labelling subgoals or using TAs will not work in your classroom: only that the claim for classroom effectiveness of these should have no privileged status simply because they are grounded on experiments. While rigorous in clinching internal validity, if neither form of experiment provides direct grounds for policy, what will? One approach might be accepting that even multiple pieces of ‘evidence of ’ a cause at work cannot combine to create ‘evidence for ’ a policy: we may need to combine multiple types of evidence. Weaving together even somewhat weaker knowledge about mechanisms, contexts and outcomes (Pawson & Tilley, 1997) may produce stronger cloth for creating policy.

This blog post is based on the article ‘Brokering knowledge from laboratory experiments in evidence-based education: The case of interleaving’ by Paul Rowlandson and Adrain Simpson, published in the British Educational Research Journal.