Nudging After the Replication Crisis
On Uncertain Effects of Behavioral Governance and the Way Forward
Not so long ago, nudging seemed to many to be the governance tool of the future. Behavioral interventions, like reminders or information about other people’s behavior, come at low cost, help their addressees make better choices, and do not hamper their addresses’ autonomy. Consequently, so called ‘nudge units’ in the US, the UK, Australia, and the EU, among others, guide governments on how to put ‘behavioral insights’ to use. Areas of application are pressing topics such as vaccination decisions or energy consumption. Seven years ago, Verfassungsblog’s milestone symposium and the accompanying volume documented the optimism around nudging at the time, with the critical debate focusing on whether nudging is paternalistic and whether such paternalism can be justified.
While such concerns certainly remain legitimate, this blogpost argues that nudging faces an even more fundamental problem (for a longer version of my argument in German, see here). The replication crisis has shaken the behavioral sciences, famous studies have been retracted due to data fraud, and, more generally, the very effectiveness of nudging has been put into question. This calls for a much closer look at how ‘behavioral insights’ are generated in the first place. From a legal perspective, the uncertainty of effects as well as side effects of behavioral interventions are influencing the compatibility of nudges with fundamental rights. Information interventions, it seems, are not as gentle as commonly assumed.
Nudging, a short introduction
What do we mean when we talk of ‘nudging’? Advocates of ‘soft’ paternalism define a nudge as “an aspect of choice architecture that changes people’s behavior in a predictable way without forbidding any options or significantly changing their economic incentives” (Sunstein/Thaler 2008, 6). Such interventions include the change of ‘default’ rules, reminders, or information about risk or (social) norms. Nudges are meant to encourage enlightened action on the part of their addressees. As nudging avoids hard mandates, it can be an attractive governance tool for regulation sceptics or in situations of political deadlock when administrations are not backed by a political majority. And while nudging is now also employed in Europe, it is certainly no surprise that the movement has its roots in the USA. As Chris McCrudden and Jeff King pointed out, nudging fits into a political agenda of economic liberalism and deregulation.
The still ongoing replication crisis
Methodologically, nudging is grounded in behavioral sciences. These, however, have been in turmoil for a while now. Large-scale studies have revealed that many results of experimental studies cannot be replicated. In 2015, a prominent “many labs” study concluded that this concerned more than half of all published psychological effects. The collective behind this study had attempted to replicate one hundred experimental studies from leading journals, with significantly increased numbers of participants compared to the original studies. While 97 percent of these original studies had statistically significant results, only 36 percent did so in the replication. Effect sizes were also significantly smaller. These findings led to a broad debate on how to improve research practices.
The central problem is publication bias. Publication bias stems from the practice that scientific top journals typically only publish studies that have statistically significant results. A significant result is one in which, according to a statistical test as well as convention, an experimental effect cannot be attributed to mere chance. Editors of top journals want to publish results that have a wide impact. Statistically significant results promise more citations than so-called null-results with no statistically significant effect. This selection behavior, in turn, influences the behavior of researchers. Authors of scientific studies have an incentive to look for statistically significant results. This again has important implications. As null-results are not worth to be written up, they “remain in the drawer”. Other researchers do not learn about them. Not only does this lead to a waste of resources as research designs are tested which have already been tried, it also creates the risk that results found by chance are overrepresented in published papers.
The uncertain effectiveness of nudging
The outlined problems are highly relevant when it comes to evaluating the effectiveness of nudges.
First, there is the question of replicability, as a high profile example illustrates. A prominent study had claimed to show that people were more honest if they signed an honesty statement before making their claims. The initial publication included a laboratory experiment and a field experiment in cooperation with a car insurer that confirmed the findings of the laboratory experiment. However, the laboratory experiment, which had taken place with 101 participants at the time, could not be replicated with a more than tenfold increase in the number of subjects to 1,235. In an example of good practice, the group published these failed replication attempts themselves. But the case also showcases the allure of sensational findings. After a tip from a researcher, the statistics blog “Data Colada” uncovered irregularities in the data of the second part of the original study, i.e. in the field experiment with the car insurer. There is much to suggest that the data from this field experiment were manipulated to show the desired finding. The evidence uncovered by “Data Colada” was so convincing that the original study has now been officially retracted.
Further, publication bias is a challenge for the conduction of meta-studies, as a recent controversy illustrates. A recent meta-analysis concluded that nudges are effective on average, especially when they change defaults. The effect of nudges was found to be small to medium. For the critics, this was overclaiming, as the study did not correct for publication bias. The basic problem with such meta-analyses is that they only access published effects in scientific studies. Therefore, a recent study by Stefano DellaVigna and Elizabeth Linos is particularly interesting. The study compares the effect sizes of randomized control trials (RCTs) in scientific journal publications with those in studies conducted by nudge units. The special feature here is that the authors had access to all studies of two leading nudge units based in the USA. This allows them to overcome the publication bias on the one hand and to make it visible in comparison with the journal studies on the other. The interesting result is that the trials of the nudge units also have statistically significant effects, but these amount to only one-sixth the size of the effects in the scientific publications. In their estimate, DellaVigna and Linos explain most of the difference with publication bias.
Who looks at side effects?
A further question, that fortunately is gaining traction in behavioral sciences, is the question of side effects of behavioral interventions. Most studies evaluate behavioral interventions in terms of their effects in the target domain. In other words, they ask whether a particular reminder, deadline, or other piece of information leads to a particular individually or socially desirable behavior. What these studies regularly leave out is whether such an intervention might have side effects in a domain other than the target domain. A recent study vividly illustrates the point:
Roadside information campaigns use display boards to provide information about the number of traffic fatalities. The goal is to encourage motorists to be cautious. But a new study using Texas traffic data shows that these boards have quite significant side effects. Jonathan D. Hall and Joshua M. Madsen were able to exploit the fact that the permanently installed display boards, which are otherwise used as dynamic traffic signs, only provided information about traffic fatalities in one week per month. Using this and accident statistics, they were able to demonstrate a causal effect that these messages increased the number of accidents. They estimate that the likelihood of an accident on the ten kilometers following the display board with the traffic fatality statistics increases by 4.5%. The explanation is obvious: worrying about a fatal accident increases the ‘cognitive load’ of drivers.
A further important question relates to whether behavioral interventions reach those addressees that might matter most for achieving a certain policy goal. To give just two examples, short messages may boost vaccination, but apparently not among those who are skeptical towards vaccines. To the contrary, such reminders may induce reactance. Similarly, messages about energy saving measures do not have an effect on households that are simply not interested in their energy consumption.
Policy implications
The uncertainties about the effects and side-effects of behavioral interventions have implications for law and policy-making. Legal discourse locates the cost of ‘soft’ interventions primarily in the fact that they are paternalistic. Thus, interventions that rely on information are seen as gentler. This perspective should be broadened. Nudges in the form of reminders or warning signals may have effects, but these effects are likely to be smaller than the studies published in scientific journals suggest. In addition, possible side effects must be taken into account, especially in the case of information solutions. A changed cost-benefit analysis may also have an impact in the context of a fundamental rights proportionality test. Possibly, a small effect size of an intervention might no longer justify a certain reduction in autonomy and thus lead to an intervention no longer being proportionate. Side effects could also influence a proportionality test if the intervention has undesirable consequences for some of the addressees, which cannot be normatively compensated for by a certain effect on the remaining addressees. Further, when looking at informational interventions, there is another problem. In a digitalized society, there is an increasing demand for the attention of the addressee, or consumer for that matter. Cognitive resources are therefore already under stress. Increasing demand for attention hits certain groups harder than others, as poverty exacerbates the problem of scarce cognitive resources.
Way forward
Turning findings of behavioral science into effective interventions is complex. Much work remains to be done in the area of cost-benefit analysis. Dealing with ‘cognitive load’ will more and more become a challenge. In a complex world less is often likely to be more. In order to have a sound basis for behavioral interventions, however, we need to reconsider research practices. Again, much can be done in this area, and the open-science movement has already brought many solutions to the table, e.g. pre-registering hypotheses, or submitting research designs to journals prior to data collection, to name just two. More generally, we should be aware that the way we incentivize and organize knowledge production matters. The quest for ‘fast-impact’ science may not deliver the robust findings society is actually in need of, and we should be careful when confronted with findings – or regulatory strategies – that sound “too good to be true”.
I believe the problem of the replication crisis is mostly due to institutionalized pressure on scientists to publish ever more results that match a certain public tendency and the general attitude that negative results are not worthy of publishing. Especially the latter aspect is pretty surprising from a Popperian viewpoint.
This can essentially be solved by (1) focussing on the quality of the research done and not the volume published and (2) not being afraid of publishing negative results, even if the concept is is “beloved” by the community at large. Considering how the current academic landscape works I think it will take a major identity crisis for the field of social psychology to straighten itself out.