Wow. As a former social scientist with an axe to grind this hits hard.
I like to provide that HN community with some context as to what this means.
There are some 300 “research” departments in each of the major social sciences: psychology, sociology, economics and anthropology. If you believe what they say, about half of their mission is teaching and the other half is research. That’s a lot, tens of billions of dollars.
The nudge findings were among the few to not only reach the level of public knowledge but, more importantly, directly influence on public policy. To use the one I most familiar with: the so called default for defined contribution retirement plans, eg 401k. These government regs assumed, for good reason, that maximizing contributions was in the public interest. Based on the nudge findings, after much debate and effort, they were updated to dictate that the max options forms was pre selected in the brief it would cause more individuals would opt for that as opposed to contributing zero.
So far so good, right? In fact nudge has become a canonical example in introductory public policy courses as to how their research can in some sense make things better.
This meta-analytic finding turns on the authors’ method for measuring publication bias. Because I accept that, I must believe that this entire body of research, probably the signal behavioral economics work, is essentially worthless! Thus, all that effort has not only been wasted but the credibility of social science in general is damaged.
Adding this to the well/known gamesmanship in peer review, debate over tenure and etc. means it’s past time to reform a large chunk of academia.
Isn't this just one specific analysis using a very narrow definition of "nudge", one that doesn't even begin to encompass the work being done at those "300 "research" departments"?
Further, isn't this using the same data from the original meta analysis that did find "of small to medium size" effect[0]?
Why would this, alone, undo decades of research and clear, bright-line conclusions such as the ones cited in my sibling comments? In other words, why is this letter the final word on the topic of "nudge", to you, and not the original meta-analysis? Sounds like you think everyone should pack it up and go home, all because of one letter using an alternative set of definitions and analysis.
Just seems like an overreaction on your part, especially given how vocal and... you-sounding (for lack of a better term) the "anti-nudge" crowd often is.
To take a wider view, a comment like yours is a more malicious form of nerd-sniping[1], especially on HN. Claim to have relevant credentials, voice a contrarian-but-popular-here opinion, and make a wild conclusion to give those reading it a feeling of "inside baseball."
I think social scientists have lost the right to the benefit of the doubt. They don't preregister their trials. They don't publish their negative results. They're notoriously bad at statistics, notoriously let their political beliefs distort their conclusions, notoriously scatter their work over thousands of arbitrary, hard-to-compare small-sample-size studies instead of concentrating their resources. I don't think it matters if this criticism is right or not. The fact that it's even possible to make such a criticism is already a condemnation of the scientific incompetence of social scientists.
First of all, preregistration is not a requirement for the scientific method, which has functioned well for centuries. That is a recent trend in response to the overflowing amount of haphazardly published science.
Second, it is up to the individual scientist to decide to preregister or not. Some social scientists may preregister.
Third, small sample size may be a fair critique, however that overlooks how difficult it is to collect such data.
You've made a lot of generalizations here that amount to, "social scientists aren't as rigorous as other areas of science, therefore we should only believe studies that disagree with their results". I don't think throwing the baby out with the bath water is helpful. You can take results of studies with small sample sizes with a grain of salt, watch for replication, etc. Lambasting the field as a whole doesn't make sense to me.
Finally, readers should note that this isn't a new argument. People have been making this claim about social science for 120 years, if not longer, but at least since Freud and contemporaries began publishing.
> You've made a lot of generalizations here that amount to, "social scientists aren't as rigorous as other areas of science, therefore we should only believe studies that disagree with their results".
I think it’s more “ignore them completely.” It brings “science” into disrepute to let social science associate with the other sciences.
> I don't think throwing the baby out with the bath water is helpful.
There is no baby!
> People have been making this claim about social science for 120 years, if not longer, but at least since Freud and contemporaries began publishing.
Doesn’t that prove the point? It wasn’t science then and isn’t science today.
> It brings “science” into disrepute to let social science associate with the other sciences.
So you just want it renamed to "social studies" or what? What is your proposal, that nobody research this topic, or that they be separated in journals etc? I doubt that will have much impact on whether it makes the news. If you want that to change, you may need to get yourself onto the board of a journal you care about.
> There is no baby!
That's reductive. Just because you don't see the baby doesn't mean it doesn't exist.
> Doesn’t that prove the point? It wasn’t science then and isn’t science today.
No, it just proves it's an old disagreement, like nature vs nurture.
There is plenty of work in social science that contributes to humanity. It will always have smaller sample sizes due to the nature of collecting the data. The work can be considered useful nonetheless.
> So you just want it renamed to "social studies" or what? What is your proposal, that nobody research this topic, or that they be separated in journals etc? I doubt that will have much impact on whether it makes the news.
Or maybe people just want to shine light on the fact that social science is harder than other science for a bunch of different reasons, bring social scientists' attention towards the tools that help mitigate this, and bring the journals that seek profit over reliable results into disrepute?
Most of the papers published in current social science journals are not science, and this has been a problem for those 120 years precisely because the techniques used in chemistry or physics are inadequate for the problem domain, so applying them blindly does not produce scientific outcomes.
> So you just want it renamed to "social studies" or what? What is your proposal, that nobody research this topic, or that they be separated in journals etc
Yes. And that the rest of us stop treating it as science, citing it as science, and relying on it as science.
For example, there is a major trend in the law of treating social sciences as having truth value the way real sciences do. That’s the kind of thing we need to stop doing.
I found the hard/soft science discussion here [1] to be informative.
It seems unlikely that all of social science will one day be declared as "not science". Aristotle's methods, for example, did not require a certain sample size.
You're applying far too strict of a definition to science. Basic forms of science can be practiced by a child at home. Journals publish more in-depth analyses, and it's up to them what to publish, at the risk or reward of gains and losses of readership.
> You're applying far too strict of a definition to science.
This isn’t a fun theoretical exercise. In the public sphere, “science” tends to get invoked with dispositive weight. And it should in many cases. But for that to work, “science” must meet the level of rigor people associate with “science.” To be called “science” it should be like physics in terms of providing truth value, not psychology.
Aristotle didn’t do “science.” He was a philosopher. His ideas were precursors to science, but weren’t science.
> To be called “science” it should be like physics in terms of providing truth value, not psychology.
It's possible to form a truth about human behavior, for example, I did X because Y. If someone points my behavior out to me, then in the future I may do Z in response to Y. Human behavior can change upon observation [1]. That doesn't make the study of human behavior "not science" in my opinion.
> Aristotle didn’t do “science.” He was a philosopher. His ideas were precursors to science, but weren’t science.
In that case, I suppose you will acknowledge that Galileo did science, despite not having a lot of data points. I think Aristotle did too, because I draw the line at hypothesis, observation and conclusion, which may or may not result in some ultimate truth that remains constant.
I suppose you would also say that the double slit test is science. Yet that result changes, and there is no truth value that we can explain, except by noting that the result changes when the experiment is observed.
> science: the intellectual and practical activity encompassing the systematic study of the structure and behaviour of the physical and natural world through observation and experiment. [1]
Given the definition, why wouldn't you consider observing human behavior to be science?
It's not useful. In fact, it's actively harmful. The constant churn of findings and retractions undermines the public's belief in science. When people stop believing in science, you get things like flat earthers, global warming deniers, and vaccine skeptics.
Humanity is observable. It's just hard to collect the data. I think you can make the case that some science isn't as rigorous, or that the jury is still out, but to say that it isn't science at all is wrong in my opinion. Even Feynman acknowledges that conclusions may be drawn later.
Aristotle philosophized quite a bit and is considered an early contributor to the scientific method. More food for thought:
>First of all, preregistration is not a requirement for the scientific method, which has functioned well for centuries.
As has been said before, the problem is that the scientific method is right eventually. It can and often does get stuck for decades at a time, if someone with, shall we say, durable beliefs gets tenure, amasses political power and shoves their rivals out of a field. The amyloid hypothesis is just the most recent example.
Modern metascience practices (preregistration, blinding, banning "garden of forking paths" subgroup analysis, demanding high p factors and larger n) don't replace the scientific method, they're supposed to speed it up! But, by definition, these are all political issues, so they attract political arguments.
> It can and often does get stuck for decades at a time,
I think this is just more data. If we're all wrong for a longer time, then the impact will be more clear.
I agree that modern additions like preregistration are helpful. I only wanted to remark that it is not a prerequisite for science.
> by definition, these are all political issues, so they attract political arguments.
People are good at gaming systems. We are naturals at recognizing patterns and will adjust our behavior to meet our goals. In that sense, social science may be targeting a moving object, almost like the difference between observing and not observing the atoms in a double slit test.
As hard as it may be in social science, the process of hypothesizing, observing and forming conclusions is still science. For some, it appears that is not science because a definite conclusion never arrives.
Which viewpoint is correct? I think it's up to you to decide. And, when you don't grant people that choice, you get an anti-science response, because people naturally reject being told what to think. Science, for me, is about asking questions, not necessarily arriving at a definitive result.
> however that overlooks how difficult it is to collect such data.
It is very difficult in physics as well. Do you know how hard it is and how much effort is involved in building the LHC? Or Ligo? Or the JWST? Or ITER? They cost billions of dollars, thousands of scientists and decades to plan and make before you even get science data. Science is hard! You need to put the work and effort in, because otherwise you can't say anything about the nature of things.
>> I don't think it matters if this criticism is right or not.
> Okay, you and I care about very different things.
Clearly it is being suggested that it doesn't matter with respect to the social studies departments being in dire need of drastic reform. If you don't care about that either direction why are you commenting on this thread? You do actually care one way and you're "point scoring" to further the argument? Something else? I'm misreading something that I think I'm reading clearly?
So are social studies departments and funding in dire need of reform in your opinion? How does the "correctness" in your view of the original submission affect that need or non-need?
and yet here you are responding in a thread with the conclusion clearly stated:
>means it’s past time to reform a large chunk of academia.
And you're taking exception to it, but now just claiming you're actually not, that these one sentence "you're wrong" responses are really something much more modest.
Out of interest do you work in the field? Have ties as a graduate to one of the departments? Or are you completely disinterested when assessing the research?
If I have an interest it's that quality research is performed that advances human knowledge with some kind of efficiency of the resources spend. ie fund something that is being done properly and well to some effect over something that has been shown to be run by those who seem to be utterly incompetent or fraudulent shysters or something else that engenders zero confidence.
Hey that's just what I said to you! Great minds etc.
Hitch your wagon to Cass Sustein et. al. by all means... Famously successful and successfully famous. What else do you need to know? ESP might be a thing too, nobody has proved it isn't. But we have shut down the research departments at universities involved in that BS...
I agree, I have worked at 5 separate tech companies and have conducted hundreds of statistically significant experiments, changing what people select by default. These methods have been effective at helping hundreds of millions of people improve choices. It’s common practice, think: pricing on Amazon, default tips on Uber, default purchase price in a video game, etc.
I don’t see how this would make me reinterpret all those successful results. Maybe I don’t understand what this is saying.
How much of this is “nudging” vs. “clearly explaining trade-offs”?
I’ve not done any rigorous research, but I’ve participated in projects that resulted in dramatic shifts towards customers choosing what the dev team thought was the “best” outcome, just by altering wording, or making “dangerous” choices harder (such as by requiring more clicks to enable).
One interpretation is that it is very hard to extract value from the nudge literature. When reading research articles, one must estimate and adjust for biases. This adjustment shifts the p-values by unclear amounts. So a positive result may just be a fluke.
As for your own AB tests, you have seen the processes that go into them and do not need to adjust for unknown biases. So when they demonstrate a nudge effect, you can believe it.
In the example you cited I don't think that's a nudge? Or is it?
I ask because I am sure that changing defaults DEFINITELY works, especially if the user does not have a strong existing preference.
You're not really changing user behaviour most of the time, you're changing the outcome of what they're trying to do, which is to reduce their cognitive load by ignoring as much as they possibly can.
Some other poster posted that they must have a pretty specific definition of nudge, because it defies credulity that defaults don’t change outcome if only because half the time I don’t read the defaults or know where to find them.
I mean I just found out two weeks ago you could change the hacker news banner color. Are you telling me I’m in a statistically insignia can’t minority of hacker news users?
Also how many settings are there in the average application, you can’t tell me most users go through all of those settings to get exactly what they want.
Defaults very clearly work in matters such as consent to organ donation. In countries where you need to opt out of organ donation, few people bother to do so.
Another question is whether this increases the total amount of successful donations. I was looking around for studies and found this one [1], which basically says "in some countries, yes".
I've heard people argue that that effect isn't a nudge, it's deceit.
That is, all you're doing is tricking people who didn't read carefully. People don't know they've opted in and would opt out if you called and told them that they checked the box.
I find it generally plausible that defaults don't matter much for what people consider very important decisions. I have minimal experience in this area, though.
There is also some research suggesting that defaults in organ donation (so called presumed consent) may decrease rates of actual donations in those countries. I can't find the original podcast where I heard about it (I assume related to either Planet Money or Freakonomics), but found this source:
That's not a conspiracy. When two companies, or countries, or individuals have business dealings on many different levels, a lot of things can be negotiated at the same time.
It seems quite possible for two things to be true: 1) The common sense notion that manipulation works; and 2) Social science couldn't find the signal above the noise.
> Some other poster posted that they must have a pretty specific definition of nudge, because it defies credulity that defaults don’t change outcome if only because half the time I don’t read the defaults or know where to find them.
My pet theory is, these results hinge on, “does it scale?”
Like, yes, you can do nudges and see behavioral changes. But what about when everyone is doing it constantly? Then people will get fatigued and form countermeasures.
Imagine this dynamic in another context:
“Guys, guys check this out, people are guaranteed to buy your product if you show arguments for it to random people!”
But, oops, centuries of marketing later, advertising isn’t automatically effective enough to cover its costs, people don’t automatically believe the ads.
I'm guilty of not reading this paper in any detail but it feels that the default setting "nudge" idea should work as described. So if you e.g. nudge people by setting up a pension plan by default (that they can opt out of) does that seriously fail to cause more people to have a pension? Or is this claiming something else?
Another similar example is jurisdictions that switched to assuming an individual consents to organs donation when they die, rather than having an opt-in system, see much higher rates of organ donation.
The hacker news banner color doesn't matter and few have ever wanted to change it. But your financial position and needs, what % of your salary you can afford to money-hole until retirement, does matter and is pretty individual. It doesn't defy credulity to me that generally people would make a choice about this (when can I retire?), and that the default doesn't influence it.
I grant that it would be surprising if it had no influence at all, but I think the effect is more the social signal that you should want to save the max, that your neighbors probably do (it's the default after all), etc., rather than people completely ignoring/missing it.
I believe you. If you know it influenced you though, that means you didn't ignore it or not even realize you could change it, which is the idea I was replying to.
Again, it would be surprising if it didn't matter at all, but not unimaginable. What you're saying is that almost everybody in your company would have contributed a lesser amount if not for the default. It means you can all afford to give up $20k or whatever in income this year. There are other factors.
There is no truth to the matter of "whether defaults change behavior". This thread started about 401ks and then was taken into color preferences on the web. If someone has a gun to their head is asked if they want to die, I'm sure we'll agree that whatever the default is doesn't matter. Whether defaults do anything depends on what we're talking about. Nudges might work in web ux but not economics, why is that so incredulous?
I'm not sure it is useful to lump nudging on decisions users don't want to or don't know how to make in with nudging on decisions users either want to or have to make.
It's a no brainer that defaults will alter outcomes for users who aren't willing or capable of making a selection for the choice in question.
Yeah, I am also confused by the statement that nudges theory doesn’t replicate and I’m afraid that statement won’t replicate haha, or rather, there are basic, indisputable findings with mindboggling effect size that countries with different defaults for organ donors have different donation rates.
Now, you can say all day long that those aren’t causal studies, but there is just no way that confounding factors like different cultures explain it, because cultures just aren’t sufficiently different, or rather cultures that are otherwise pretty similar have vastly different donation rates.
A lot of the replication crisis imo is just realizing that landmark studies were underpowered. That is, they don’t prove what they meant to prove, but that is very different from whether the effect exists i.e. an effect may exist yet be hard to prove and social scientists are rarely rigorous in study design, from training and from inherent difficulty.
Nudges are often imagined as just how choices are presented, but yes the default option is considered part of nudge theory. As also is social proof ("Your friends picked this choice").
also, the question is how much structural elements influence outcomes (not merely decisions), not whether they do or not. that’s the extra complexity of a social system built atop a biological system built atop a chemical environment built atop a physical one. we’re complicated. physics is nigh child’s play in comparison.
This seems to be the standard response to anything that seeks to debunk nudge. Any time you say ‘This example of a nudge doesn’t work / isn’t replicable / isn’t actually socially helpful’ someone will say ‘Ah but that’s not really nudge tactics.’
Also the other way will come up in these kinds of "arguments" without doubt.
"How can these horrible critics say nudges don't work? Have they never been nudged with a loaded revolver? Can they not imagine that working?"
Leaving those of us who don't follow the controversy closely in the field and are interested in what has actually been found out and understood about the world with some degree of confidence across many fields of study, leaves us scratching our heads unable to see through the viewing window for all the mud getting flung.
401k is a good example, I have had it for my whole career but if there was a form at any point asking me how much of my pay I want to contribute I would have said 0 because I prefer cash at hand than cash some day and all the b.s. health insurance is already taking a lot. But 401k doesn't bother me enough to change the default so I leave it be as some kind of rainy day fund. I didn't like paying the penalty to withdraw it, unless I turn 65, it will always be worth significantly less than it says on paper, I am not even convinced it is beating inflation. My point is, because people don't change the default it does not mean they have accepted it or like it, that is an incorrect conclusion.
Food is another example, I like cheese sometimes but when there is an option for it I take it out of the food most times but I won't go out of my way to ask for its removal otherwise, this has a real health impact.
If you’re worried about beating inflation, have a long time horizon, and don’t mind some risk you might looking into investing in total market index funds. An index fund for the S&P 500 has averaged ~10% returns when looking back 30 years [1].
If you’re just concerned about inflation, don’t like risk, and don’t mind locking you money up for a little bit Treasury Inflation Protected Securities [2] are also a thing. Their returns are tied to the Fed’s measurement of inflation (CPI).
> This meta-analytic finding turns on the authors’ method for measuring publication bias. Because I accept that, I must believe that this entire body of research, probably the signal behavioral economics work, is essentially worthless!
I strongly disagree with this statement, even as someone who believes “nudge” effects are wildly overblown.
It means “these studies failed to find evidence” - NOT that there is nothing to find.
The distinction is important because, as it turns out, the policies that the research influenced did work, in many cases. 401k contributions did go up, in many cases. More people became organ donors. More Europeans got stronger privacy protections.
“The power of defaults” is such a cliche because, in many cases, it works.
The problem with these studies is overstating the effect - not spewing worthless BS.
I defend and elucidate. Worthlessness I would define here relative to the amounts of public and of policy attention the nudge findings have received vs net value add modified by these results.
Perhaps due to the PR efforts of leading researchers, it was much more than “set defaults intelligently.” The interpretations were more like: we can use social science to shape peoples’ behavior at the margins. Further these marginal changes would cumulate to substantive and lasting societal improvement.
On reflection, it seems to me that the value of this paper stems from its attempt to measure or quantify publication bias. In this case, the bias was positive in the direction of with studies confirming nudge effects.
Taking that a step further implies that the actual net nudge effects across published and unpublished studies were statistically and therefore substantively insignificant. Hence the use of the term worthless, i.e. non-findings.
To say that it is costless to implement a nudge scheme in the behavioral economics sense is simply untrue. In the retirement case it required a lengthy ethical and legal debate; some study and political argument as to the best outcome, which is in part a redistributive question, hard costs associated with revision or development of messages and other materials, etc.
Worse I believe is the damage done from attention and action predicated on now seemingly faulty social science. What could’ve been done instead and what will happen in the next time a social scientist claims an ‘easy’ way to make things better are costs.
> statistically and therefore substantively insignificant
This is not what statistical significance implies. This misunderstanding, and its inverse, leads to the very errors for which you criticize the "nudge" papers.
More to the broader point, "set defaults intelligently" in fact implies the ability to "shape peoples' behavior at the margins." Otherwise, why bother thinking about them?
That's why what is actually at issue with "nudges" is effect size & context: how much of a difference can we have, and where?
And to that question, this paper provides little insight. It aggregates too much & ignores real-world policy evidence.
Now, it's still a good paper - people have gone WAY overboard with nudges in silly places - it just needs to be understood as "let's reign in expectations" and not "this field is bunk"
I think the issue here is using science / evidence to push for policy changes when there isn't actually sound science or evidence. That can be done with sound policies that work just as well as it can be done with bad policies. But we should always be concerned when unsound science gets used. It can be used to shut down valid policy debates. And eventually, on a long enough time line, it will get abused by bad actors.
Can these effects be explained without inventing a new term? Because if they can then these studies didn’t really find anything did they?
Whenever I see a new term being introduced as an explanation I am hesitant to accept it, as it may turn out to just be explaining the planetary motions with epicycle, when the motions can be easier explained by moving the sun to the center of the solar system instead of the earth.
Not very scientific, but isn't it just laziness? Most people (including me) are too lazy to think about all the choices they could make, so they just stay with the default choice most of the time. Not because they actually prefer it, mainly because they never even read it.
Indeed. Your alternative theory doesn’t require an extra construct, and instead uses a pretty established cognitive behavior (the tendency of inaction) to explain the same phenomenon. I would say your explanation has the advantage of Occam’s razor, whereas Nudge Theory doesn’t.
That we need to “create” the idea of a “nudge effect” when it’s clear people take on commonly encountered social behaviors is bizarre.
Cognitive experience is a for loop with memory; for time spent in situation X, memory forms at rate Y. Social science solved.
Social science derives all it’s conclusions by studying the same old physical world as physical science. It’s restatement of science customized to cultural tradition. It’s cultural tradition to over hype our specialness selling books and big ideas, when the math is the same everywhere. Creating cultural objects of obvious math is a commodity now.
Except that's not what the study says. Quoting a comment below, "The linked study (and the Merten's study it's built ontop of) classifies defaults as "structural" interventions. In the linked meta-analysis, after stripping out estimates of publication basis, structural interventions have the most "robust" evidence left (95% CI of 0.00-0.43)"
I left psychology around the time that nudging was gaining traction and I haven’t really been following it. But it seems to have a couple of red flags:
First of the definition:
> A nudge is a function of (condition I) any attempt at influencing people’s judgment, choice or behavior in a predictable way (condition a) that is motivated because of cognitive boundaries, biases, routines, and habits in individual and social decision-making posing barriers for people to perform rationally in their own self-declared interests, and which (condition b) works by making use of those boundaries, biases, routines, and habits as integral parts of such attempts.
I find this definition overly permissive and overlay reliant on unnecessary cognitive terms (like judgement and choice; which can be shortened to behavior) or economic terms (like rationality and self interest). As a fan of behaviorism this feels like an attempt to introduce epicycle into a theory that doesn’t need it. This effect—if it exists—can probably be adequately explained with good old classical conditioning and conditional reinforcements. This is the first red flag. That is not to say we can’t look for specific cognitive functions which makes some reinforcement contingencies more effective then others, but nudge feels a bit too general to actually be of any use in a model. It in fact reminds me of Albert Bandura’s theory of self-efficacy, a theory that seems to have reach a dead-end at this point.
The second red flag is the economic presuppositions. When I skim through the literature it feels like they are creating a band-aid on the thoroughly debunked notion of Homo economicus (the believe that human individuals always behave in a rational way optimized for their own self interest). So instead of recognizing the fact human behavior is more complicated, what they try to do is invent a new term to counter-act the instances where biases are “preventing” such a behavior pattern. I find such an effort to be doomed to fail, as—despite the persistence of economists—rational behavior means a different thing for each individual, and there is no “patch” for what economists call “biases”.
How does a meta analysis of something like this avoid, I don’t know what it would be called but like regression to the mean. A “nudge” isn’t a singular thing, it’s a very diverse process requiring a competent administrator. My gut would say when you averaged all those out, you’d see no effect because your experimenting, some work some don’t work, some backfire. It seems like you’d have to do a meta analysis on a specific nudge, not on groups of nudges.
They aren’t summing effects. An effect is not cancelled by an inverse effect or as you put it, backfire.
The methodology should (I haven’t investigated theirs in detail) not be susceptible to this, and I doubt a mean of effects would make it through peer review for reasons including the ones you’ve mentioned.
> Thus, all that effort has not only been wasted but the credibility of social science in general is damaged.
I don't think that's entirely true If anything this just highlights how complex behavioural science really is, as they're dealing with surprisingly complex humans and their surprisingly complex lives. Behavioural science is a young field.
Hah. Reform academia. Good one. When people have tenure, they'll be there teaching their version of this stuff for a long time and it's all but impossible to reform them without shutting down the departments.
In many schools, these social science departments are a favorite for the weaker students who don't really do so well with math. They're usually filled with athletes. They love to absorb pop psych results like Amy Cuddy's Power Pose and so they don't want to listen to anyone question their results with lots of meta analysis. They want some basic ideas from class in between lots of time on the playing field.
I'm afraid that their demands will far outweigh any desire to force the fields to search for absolute truth.
> This meta-analytic finding turns on the authors’ method for measuring publication bias. Because I accept that, I must believe that this entire body of research, probably the signal behavioral economics work, is essentially worthless!
This is a pretty short article, how are you confident of such a broad conclusion? What makes you that confident that this meta-analysis is decisive?
I think the worst offense of the social sciences recently is their quest to correct bias and with that the creation of idiots that believes themselves to be able to do that.
Because now you have said idiots running around screaming how terrible that bias is completely neglecting the fact that everyone is subjected to it.
Yes, nudging is an extremely well established concept, all the way from theory to policy - there's a Nobel (memorial) prize for the theory, and the UK government explicitly established a 'nudge unit' (the Behavioural Insights Team) to turn it into policy.
> Based on the nudge findings, after much debate and effort, they were updated to dictate that the max options forms was pre selected in the brief it would cause more individuals would opt for that as opposed to contributing zero.
I would be shocked if that wasn't true though. Is there any evidence it's not true in that specific case, that pre-selecting the max options causes more individuals to opt for that? Have individuals opting for that indeed gone up since this was done?
No offense, but the research in social sciences has very little credibility. From the reproduction crisis to the blatant inability to research things that go against what the professors want, it’s just untrustworthy. Academia needs to fix itself.
Reading this, I don't get how you can take all "nudging" and declare "No evidence". Surely "nudging" encompasses a whole range of different actual actions. Some nudges work, some don't. You can't just average across all of them.
I'm probably totally misunderstanding, but it sounds similar to saying "there is no evidence for medicine" because you've averaged all the papers describing medical interventions that work and those that don't.
I thought the point of "nudges" is that they are so cheap to implement you can easily afford to try many. Most won't work, some will.
This is my interpretation as well. Also weird to think about publication bias in this way: "these studies about the effectiveness of snake oil as a drug weren't published, so we must be overestimating how effective drugs are".
The authors do mention that there is likely to be heterogeneity in (real) effect sizes, but somehow still go with this title/abstract.
Maybe there is a valid conclusion that some of the many nudge studies are probably claiming effects that don't exist. That could be interesting in itself. But rejecting the whole field based on this kind of argument seems wrong.
While it might be to catch the reader, I don't think it is wrong. The large problem is that we had policies implemented to nudge people in certain directions. Apart from the ethical question there needs to be hard evidence before we employ authoritarianism like that. So the headline should be pardoned, but not those that employ nudging for the time being.
I have the same concern. Nudging is an umbrella term for a vast number of very different activities. For example, nudging is a term used for motivating more carefully designed road markings. I find it hard to believe that some of these newer designs don't "work" better than the old ones, some of them are quite ingenious. At the same time "nudging" is also used for all sorts of public policy framing issues that are more questionable and have probably harder to measure effects. As you say, each "nudge" needs to be evaluated individually.
Agree with you. Nudging is a type of user experience design. UX designers nudge with every design decision they make, and the effectiveness of those decisions is quantifiable. So it's hard to argue that all nudges are ineffective, just like one can't argue that all UX design is ineffective.
There are a number of posts that address this issue, as well as issues raised by responses to this post. I posted somewhere "no true Scotsman"--I'm not claiming my post was enlightening in itself, but the post I was responded to (and the entire thread) was, IMO, enlightening.
As for averaging, yes, you can: if a nudge is ineffective, then its result will be random, and a bunch of ineffective nudges will average zero effectiveness. The effective nudges will then push the overall average above zero. We don't see that. (The same would be true for medical interventions, unless some cause harm.)
As for being able to try lots of them: in some circumstances, maybe. But when a government is trying to nudge people towards some desired behavior (vaccination, say, to take a random example), they don't try sending out a bunch to different groups of people, then polling each of those groups to see which groups--and therefore which nudges--moved. And it's not always practical, anyway (and the vaccination example is a case where it's almost certainly not practical).
> if a nudge is ineffective, then its result will be random
Ineffective and random are different. Ineffective means that the effect size is smaller than required.
For example, if you read "Paracetamol was ineffective for pain relief after surgery" it doesn't necessarily mean that the effect of paracetamol is unpredictable, inconsistent, immeasurable, or that it had no effect or negative effect. You would most likely interpret it to mean that the paracetamol did have an effect but it was insufficient - the patient was still in too much pain.
Similarly, if a nudge intervention was ineffective, it doesn't tell you how it failed, only that it didn't reach the threshold for success. And it certainly doesn't tell you anything about how well an aggregate of some effective and some ineffective results would perform.
I agree with other commenters that it's unlikely nudges never have an impact.
We should also be wary of high-profile debunkings, now that they're increasingly in fashion due to the replication crisis and the general dour mood. It's easy to p-hack a result into significance, but you can just as easily hack results into insignificance.
These days, both findings and debunkings need a skeptical eye.
That's different from the existence of the phenomenon.
Same thing happened to Kahneman, Daniel (2011) and his book of Thinking, Fast and Slow. He acknowledges that several pieces of evidence he presents in the book has disappeared and can't be replicated.
He still thinks he is right, he just admits that he does not have strong evidence anymore.
What is left is a theory with less and less evidence supporting it.
I just think it's implausible that there's been no good research showing solid evidence of choice architecture mattering. Could be, I'm not an expert, but I'd like to see where things stand after a couple more years of research and debate.
I'm a UI designer, and my experience of implementing 'nudges' is that sometimes they work, and sometimes they don't.
The reality is that the way people make decisions is stupidly complex, because people have stupidly complex lives. Some tweak will work great for one project, and do nothing on the next one. It's hard to even say if it was the nudge that worked the first time.
I really view nudge theory as one of many ideas of things you can try, a tool in a toolkit. But the only tool I really feel confident works is the design-test-iterate loop.
I was working on a fintech project (gonna be vague as it's not yet released).
The legal team told us we couldn't use default choices anywhere, as it could count as giving financial advice. Fair enough. So we designed the onboarding, and there was this choice the user had to make before we could create their account.
During testing, we found people were getting really stuck on this choice, to the point of giving up. The choice actually had quite low impact, but it was really technical - a lot of people just didn't understand it. Which makes sense our users weren't financial experts, which was our target user. This choice was a new concept for the market, so we couldn't relate it to other products they might know. The options inside also had quite a lot of detail when you started digging into them, detail we had to provide if somebody went looking for it. Our testers would hit this choice, get stuck, feel the urge to research this decision, get overwhelmed, give up.
We spent so long trying to reframe this choice, explaining it better in a nice succinct way, we even tried to get this feature removed entirely - but nothing stuck.
Eventually after lots of discussion with legal we were allowed to have a 'base' choice, which the user could optionally change. We tested the new design, and it made a significant difference in conversion rates.
Huzzar for nudge theory! Right? Well, maybe. I think it's a bit more complicated.
- The new design was faster. There was less screens with simpler choices. It went from a 'pick one of 5' to a 'heres the default, would you like to change it?'. Was it just the speed that made a difference?
- The user was not a financial expert, and the company behind the product was. In some sense was the user just thinking 'these guys probably know more than me I'll leave it at that'. Imagine trying to implement this exact change on something the user is an expert in - say like your meal choice in an airplane. I imagine most people would think "How rude choosing for me! I'm an expert in what I feel like eating I want to see all the options".
- It had less of a cognitive load. Like the whole onboarding flow was already really complicated, just reducing the overall mental strain to make an account may have just improved the whole experience. E.g. if we had removed decisions earlier in the flow, would this one still have been as big of an issue? We never had time to test it, so I can't say for sure.
- Lack of confusion == confidence. For the users who didn't look at the options and took the default, did they just feel more in control and confident because they weren't exposed with unfamiliar terms and choices? They never experienced the urge to research.
Like at the surface level this new design worked great, so job done. But it's hard to say definitively it was because of nudge theory. I don't think you can really blindly say "oh yeah defaults == always good" and slap them on every problem - which is why the design-test-iterate loop is so important.
I think all of those things you listed are kinds of nudges. They are changes in the "choice architecture" that steer the user to an action.
In the context of government, a nudge means influencing people to choose something desirable, while still leaving open the option for people to choose what they want (hence preserving liberty). In contrast, a non-nudge solution would be a law or regulation that forces people into the desirable option or perhaps a tax on a certain choice.
In your UI, an example of a non-nudge solution would be removing the other options, effectively forcing their decision. Another example of a non-nudge would be charging different fees depending on their decision.
"A nudge, as we will use the term, is any aspect of the choice architecture that alters people's behavior in a predictable way without forbidding any options or significantly changing their economic incentives. To count as a mere nudge, the intervention must be easy and cheap to avoid. Nudges are not mandates. Putting fruit at eye level counts as a nudge. Banning junk food does not."
A nudge has to push the user towards one certain decision over another, that's the whole point. It's opinionated. The factors I listed aren't inherently opinionated, we could've tried to improve them without pushing the user to a specific choice:
E.g. speed: We could've removed earlier parts of the onboarding to make the overall experience less long, or compacted the UI so it was visually easier to skim the choices.
Expertise: we could've assured the user before the choice, that all options were good cause we're the experts and that we would've give you a bad option - so don't agonise.
Cognitive load: We could've reduced the info we showed about each option, or hidden it away behind a modal, or re-written it in plain english. The legal team told us we had to use the legal descriptions of the choices, which included technical language.
Confusion: We could've made an visualisation of the impact of their choice, that changed as they swapped between each option - showing them them a more tangible outcome of their choice. It was a complicated concept to get, so the addition of a visual aid instead of just written descriptions might've helped.
To be clear - I'd be surprised if these things would've worked, and I'm certain setting a default made a difference. The point I'm making is that I don't know for sure how much of a difference. The change to implement the default, by my eyes, also improved the overall design in these other ways as well. We didn't isolate it down to exactly what made the improvement, we were just happy it happened.
The point I'm making is you could quickly skim read this story of a team stuck on a problem, who after implementing defaults found their conversion rates jumped 11x holy shiiiiiiii- and it sounds like it's all thanks to nudge theory. It's exactly like a case study you'd see in a co-design agency's portfolio.
But in the actual real messy world of designing interfaces, it's just always a bit more complicated than that. No change is truly isolated, tested in a controlled, academic fashion. You just design your best shot each time and see what works. Because of this, it's hard to truly definitively say an improvement was because of a nudge. Best I can do is, "I mean probably" haha.
There are some good points you raise, and I think you're really testing the nuance of what nudging is. But nudges are perhaps more basic than you're thinking.
Nudges don't need to steer the user to a specific choice, just a behaviour change. Sticking with a conversion flow counts as a behaviour change.
Nudges don't need to be simple or understandable. They can be a set of complex changes where causation isn't clear. They just need to get results.
The only really hard requirement that would rule out a nudge is if you forced a choice or used financial incentives.
If you read the Nudge book you'll see that it's a political book, really. The authors introduce nudges as an alternative to hard regulation. Instead they propose that governments consider influencing behaviour in a softer way, but still leave the escape hatch open for people with strong preferences to choose what they want. This strikes a balance between state involvement and principles of liberty. (Or at least that's their argument.)
Because of this framing a nudge is defined mostly by what it isn't. It's not a nudge if it forces a user to a choice; a nudge is anything you do that changes what users do without forcing them.
This is what you've done with your series of changes that resulted in increased conversion. You've left all the choices open still, so users have as much freedom as before, but you've managed to predictably change user behaviour in a way that aligns with your goals. In other words, you've nudged them.
Oh interesting, I had no idea it had such a wide scope, thanks for the explanation. I learnt about nudges via a uni course, it sounds like parts were lost in translation. I should check out the original book.
I've always understood this part of the description to be more than a single one-off choice, so none of that around the decision point in the financial product would count.
> The new design was faster. There was less screens with simpler choices. It went from a 'pick one of 5' to a 'heres the default, would you like to change it?'. Was it just the speed that made a difference?
If you're just going from "pick one of 5" to "pick one of 5 but there's a default", I wouldn't expect one or the other to be "faster". Was the new design more different than that?
As for the rest, I think the beneficial features of the design are predicted by nudge theory. "Providing a credible default reduces cognitive load and confusion on the path to a decision, as the user can just trust the defaults have been set up reasonably" has always been the theory for why nudges work.
The first version was a screen with 5 choices, and detail about each choice that you'd have to scroll through. The second version was a simple "We've set this up for you" screen with two options, continue or customise. If you hit customise, you'd get shown the original five choice screen.
What I mean by it being faster is you could get to the next step of the process with both reading less text, and seeing less choices (just two buttons not 5). Cause if you just slapped the continue button (which most people did), you'd skip the whole explanation of all the choices.
> However, all intervention categories and domains apart from “finance” show evidence for heterogeneity, which implies that some nudges might be effective, even when there is evidence against the mean effect.
I think some people careen from fully trusting one thing to fully trusting the opposite thing. If you're not one of those people, you'll never understand dismissing things on the basis of "you should be critical of this, because not everything that people say is true."
I do feel like that, even though being critical is something we should always do, that in cases where
1) the only reason you started paying attention to something was an intuitive hunch that it could matter, and
2) the only reason you started treating that hunch as established science is because you did experiments that had significant results, then
3) later you found that significance could be entirely accounted for by the file-drawer effect,
you need to adjust your expectation that there actually is an effect to lower than your expectation was at step 1). It isn't that the theory hasn't been tested (although you can argue it hasn't been tested for ingeniously enough yet), it's that it has been tested and no effect has been shown.
If you allow the existence of interest in a theory (represented in amount of ink spilled and number of experiments done) to raise your expectation that the theory is true, despite experimental indications to the contrary, you're not really doing science, you're just throwing good money after bad, probably motivated by a desire to protect the researchers and institutions that are heavily committed to the truth of the theory and/or the desire to protect other theories that depend on the one that hasn't shown results.
There's a lot of junk behavioral science out there, but things like "people often go with a default or recommended option so they can move on with their day" seem so obvious to me that I become suspicious of this debunking for debunking too much.
That's just refusing to be convinced by evidence, though. It's good to have hunches, but it's good to let them go after you've done the experiments. Come up with a new hunch and a new experiment that shows why the expected effects weren't seen, and you're right back in there.
"Refusing to be convinced by evidence" is a simplistic false dichotomy. The evidence is interesting but I have several reasons not to immediately take it as definitive.
Are you really certain that a big debunking in PNAS, surfing a wave of other celebrated debunkings, should be taken as definitive, when a good deal of the research being debunked was published to similar fanfare in PNAS back when a different kind of research was fashionable?
I take neither the original research nor the debunking as particularly credible. Without technical expertise, I'm left to educated guess. It's just my guess.
The idea that we should always be convinced by evidence regardless of context is a vast overgeneralization, impossible ("the evidence" overall rarely points only one way, even if the latest chunk of new evidence does), and in contradiction with Bayesian epistemology.
My problem isn't that I think people should be credulous of everything, it's that I don't think "it's just obvious" is a proper counter to experiments that show nothing. If the effect is so obvious, it should be obvious how to design an experiment that would show it.
I don't even know what you're defending here other than believing your first impulse above any subsequent evidence. Nobody is preventing anyone from proving an effect, in fact they poured money into the attempt.
The linked study (and the Merten's study it's built ontop of) classifies defaults as "structural" interventions. In the linked meta-analysis, after stripping out estimates of publication basis, structural interventions have the most "robust" evidence left (95% CI of 0.00-0.43), and as their paper text says "whereas the evidence is undecided for “structure” interventions". Other structural interventions include, making it easier to select the desired outcome (or make it harder to switch away from desired out come), changing the range of options to facilitate evaluation, or trying to compensate for biases and loss aversion in choice structure. As you can see, this is a broad range of interventions.
A little bit further they say "However, all intervention categories and domains apart from “finance” show evidence for heterogeneity, which implies that some nudges might be effective, even when there is evidence against the mean effect", which makes sense. People understand stakes generally, and will likely apply different care/effort in different context, modifying the context specific effect of any given intervention.
I think the paper makes a reasonable argument:
1. There is significant publication bias in nudging studies
2. The effect of providing additional information at time of selection, or providing reminders/affirmations for self control is basically non-existent
3. The effect of modifying choice structure is indecisive. Likely we'll find that some structural modifications have strong effects in some context, but others have little or no effect is other context.
There's a lot of confusion here about what this article is talking about.
Nudges aren't just defaults. We've known for over a century that people are influenced by defaults. Nudges also aren't anchoring, where choices influence one another. Kahneman & Tversky won a Nobel prize for that and other behavioral economics ideas a decade before the idea of nudges.
Nudges are a bigger idea that many small changes lead to huge behavioral changes. Like providing a social reference point (see the average electricity use of your neighbors), surfacing hidden information (a red light to remind you to change your A/C filter), change the financial effort involved in something (deposit your drinking money into an account that you lose when you drink again; health plans that pay to stay healthy), change the physical effort of making bad choices (a motorcycle license for people who don't want to wear helmets that is much harder to get), change the consequences of options (pay a teenager $1/day to not get pregnant), provide reminders (check if an email is rude and have someone confirm they want to send it), public commitments (say you are doing X makes you more likely to do X), etc.
There are various examples of each of these working to some extent in specific circumstances.
But we have a lot of other tools for changing people's behavior. We have education campaigns. We have fines. We have taxes. We have tax breaks. The idea behind nudges is that they're an easy replacement for many of these other tools.
But the meta-analysis shows that nudges aren't a general-purpose tool that leads to significant changes in people's behavior. The behavioral changes are small, the same as we get from a fine, a tax, or an education campaign.
Aside from specific circumstances, nudges don't work better (and may be much worse) compared to our usual tools for getting people to behave.
I'm not a social scientist, so please help me understand this. The way you have defined nudge it seems like a very broad category. Some passive nudge (defaults) and some active (red lights) and a whole lot others.
If a category has such a broad number of phenomenon then shouldn't we be analysing individual phenomenon instead of the category as a whole? For example; defaults may work and red-light thing may not work. Why place them both under the same bucket at all? Why not study them in isolation?
> For example; defaults may work and red-light thing may not work. Why place them both under the same bucket at all? Why not study them in isolation?
And that's exactly how these meta analyses work! If you look in figure 1, they break down nudges both by the kind of intervention and by the domain. Maybe some types of nudges are much better than others. And maybe nudges work much better for say food vs finance.
Yes. Defaults have an effect, most other nudge types don't. But the domain doesn't matter much it seems.
Reading through the article, it seems the actual claim is much, much weaker than the title:
> However, all intervention categories and domains apart from “finance” show evidence for heterogeneity, which implies that some nudges might be effective, even when there is evidence against the mean effect
So the article is saying when you look at studies of all "nudges" as whole, adjust for publication bias[0], there isn't evidence for nudges as a whole. Of course individual nudges could still have a positive impact.
Maybe I'm misinterpreting what it's saying, but as an analogy, that would be like doing a study of all "diets", determining that when you combine data for all studies on diets there isn't a positive effect, then writing an article with the claim "no evidence for diets". There's no way you could reasonably make the claim that no diets work.
[0] If you look at the studies on "nudges", studies with smaller sample sizes detected a larger positive effect size. This is because smaller studies that get a positive effect are more likely to be published than smaller studies with a negative effect. The article uses this to analyze just how strong the publication bias is and adjust for it.
You are misinterpreting what it is saying. It's like multi comparison bias - if you do a bunch of comparisons and find one significant at 0.05, you can go ahead and say that is significant if you had looked at it alone. But if you consider all of the experiments you are doing, you would expect to find one as significant just by random chance even if it isn't significant.
No evidence for nudging =/= nudging doesn't exist.
I'm fairly sure anyone who has done A/B testing at scale has plenty of evidence that nudging works. Perhaps not up to the standard of science, but there are literally people who manipulate choice architecture for a living and I'm fairly convinced a lot of that stuff actually works.
"... evidence that nudging works. Perhaps not up to the standard of science..." That's pretty close to saying it doesn't work. The point of this meta-study was precisely to show that the evidence claimed to support nudging was probably attributable to random variation + unnatural selection, where the unnatural selection was publication choice: either the researchers who got negative (null) results chose not to bother writing it up and submitting it, or papers that reported negative were rejected by publishers.
There are lots of people who do X for a living, but where X doesn't work: palm readers, fortune tellers, horoscope writers, and so on. I'm not even sure that funds managers reliably obtain results much above random.
I think what’s not clear is what’s in those papers and what exactly they have to say about nudging and what definition they’re using. It defies credulity to think that changing defaults in software doesn’t change behavior if only because most users aren’t technically savvy enough to change their settings.
On the other hand the dream of nudge theory is something like a study done in the UK that suggests that adding the line “most of your fellow citizens pay their taxes” will increase the likelihood that people pay taxes. This I’d be more likely to believe the benefits are not clear, and more importantly difficult to replicate across time and culture.
It seems that trying to do a meta-analysis on all of nudge theory (or large categories of it) would indeed show know impact. It’s not like you’re testing one thing, you’re comparing well designed programs, with ones that aren’t.
To say things a different way, I don't think this study will change anything for people actually doing choice architecture in applied settings. They have results that speak for themselves.
This is exactly how a midwife explained to me why she uses magic crystals. She told me that there's science, and there's results, and that she's seen the crystals work.
Obviously they don't work by magical vibration, but are you sure they don't work at all? If the midwife feels and acts more confident from having that tool or the mother feels more relaxed because she thinks they will make the process easier, then the crystals do, in fact, work. They just don't work through the mechanism those individuals think they do.
I mean, yeah, if she has solid RCT data on thousands to millions of childbirths and has found a statistically significant impact from using the magic crystals, I would support their use. A/B as well as scientific research uses the same basis.
The issue is that in fact the midwife will not have such data. The comparison being made is that A/B testing, if run competently, is pretty close to scientific research, in particular for research related to nudging.
I wonder how many engineers crack open a statistics book to find the correct test versus just plotting box plots and saying "see looks pretty different"
"I don't think this study will change anything for people actually doing choice architecture in applied settings." Probably true, but then evidence that horoscopes etc. don't work, doesn't prevent people from drawing horoscopes, or other people from relying on their horoscope to plan out their day.
"They have results that speak for themselves." Let me put my point differently. Suppose that nudges don't have any effect at all (null hypothesis). More concretely--and just to take a random number--suppose that 50% of the time when a nudge is used, the nudgees happen to behave in the direction that the nudge was intended to move them, and 50% of the time they don't move, or they move in the opposite direction. And suppose there are a number of nudgers, maybe 100. Then some nudgers will get better than random results, while others will get no result, or negative results. The former nudgers will have results that appear to speak for themselves, even if the nudges actually have no effect whatsoever.
This is the same as asking if a fair coin is tossed ten times, what is the probability that you'll get at least 7 heads. The probability of such a number of heads in a single run is ~17%. So 17% of those nudgers could be getting apparently significant results, even if their results are actually random.
I think gp and you probably see eye to eye, but gp has a problem with your phrasing.
If the effect does not live up to scientific rigour, that (more or less) implies that the effect is roughly indistinguishable from randomness.
If folks have results that speak for themselves, then the effect more than likely is scientifically rigorously testable. It may already have been - by those very results.
Seriously, what about that kind of publication bias: A/B tests don’t get published.
If you run a useful system where it would be meaningful and interesting to know whether a social science theory actually applied, you might run an A/B test to see if it works. If it works, it is adopted—but it is almost never published. And that is for two reasons: 1. no incentive to publish and 2. major incentive not to publish. #2 is recent (post Facebook experiment) and it is specifically because a large portion of the educated public accepts invisible A/B testing but recoils with moral indignation at the use of A/B testing results in published science. Too bad: Facebook keeps testing social science theories, but no longer publishes the results.
The standards of selecting a result of an A/B test are less stringent than those of publication for the advancement of knowledge. For publication, the goal is to determine whether a model is accurate. For A/B testing, the goal is to select the best design/intervention. The difference is that for scientific testing "inconclusive" means that there isn't enough evidence to consider it a solved problem and it should have more research, while in A/B testing "inconclusive" means that any effect is small so you should pick an option and move on.
As an example, suppose I flip a coin 1000 times and get heads 525 times. The 95% confidence interval for the probability of heads is [0.494, 0.556], so from a scientific standpoint I cannot conclude that the coin is biased. If, however, I am performing an A/B test, I would conclude that I'll bet on heads, because it is at worst equivalent to tails.
I think you are missing the point. With academic publication bias, sometimes an unbiased coin gets heads 600 times by chance. Those studies get published. But, if you ran the test again, you might only get 525. That study won’t get published.
And, in opposition to your assumption: there is nothing to prevent A/B tests being published with high academic standards— like a low p value and tons of n. In an academic context, that’s just fine— it’s a small but significant effect.
A/B tests are simply controlled experiments—which are the gold standard of scientific evidence generation in psychology. My point is that the main generators of this evidence are only permitted to use this evidence to inform commerce not public knowledge. That is a loss for science and public policy, in my opinion.
They note that there is no evidence for nudging as being generally effective. So any individual nudge could be effective (except in finance in which they found that none are effective).
From what I’ve seen there is even more incentive to focus on positive A/B tests. It’s the way you get credit for your work at a company. A negative test is counted as barely anything. So your incentive is to run tons of tests, then cherry pick only the positive ones and announce them widely. Another strategy is to track multiple metrics for each test and not adjust for that when computing p values. But then at the end you only report the one metric that was positive.
What's exactly "nudging" here?. For example it has been shown that for organ donation, if the default is affirmative (opt-in) and you have to opt-out, then organ donors double https://www.science.org/doi/10.1126/science.1091721 , I think this is one of the "nudge" example in pop-science books.
> A nudge, according to Thaler and Sunstein is any form of choice architecture that alters people's behaviour in a predictable way without restricting options or significantly changing their economic incentives. To count as a mere nudge, the intervention must require minimal intervention and must be cheap.
Thaler and Sunstein wrote the book on nudges, quite literally. So their definition counts, and it's the one from the article. The opt-in/out decision you mention isn't a nudge in this sense. You're not asked what you prefer, you have to be aware that you can opt-in/out and then actively pursue that option.
In the case of organ donation, Thaler has written that he actually prefers mandated choice--i.e. there is no default but you have to either opt in or opt out--in this case. [1] But I'm not sure why a system where the government creates a default of either opt-in or opt-out (which you can change) wouldn't be a nudge.
The system is that you're by default in some register. The choice has already been made. Many people aren't even aware of it, or only remotely. You have to undertake action to change it. That's not a "choice architecture" in the nudge sense. That would require that you are presented with both options simultaneously, and are forced to choose. A nudge, e.g. on a web form, would then be to have one option already checked.
>A nudge, e.g. on a web form, would then be to have one option already checked.
Yes. But an alternative is to present choices on a web form without one being pre-selected but with a choice mandatory. Which is essentially what I understand Thaler to be arguing for.
>The choice has already been made.
I'd say that still is a default but one which requires more effort to change than a pre-selected option on a webform. And arguably sufficient effort that it may no longer be reasonable to default to organ donation in that manner.
Wouldn't it not be a nudge because it outright changes the unspoken costs? If the public is largely apathetic about what happens to their bodies post death, if the situation would leave them viable having to take an action vs inertia is an added costs. This could make deciding "not worth it" for low rewards, let alone preferring not to think of the possibility of their own demise.
A nudge is naive if not circularly defined then as it presumes at least two permanently distinct classes of humans: informed humans who can architect nudges and learn about them and other humans who must respond in the same way every time and cannot do this meta-learning.
I think that's more narrowly defined as "status quo bias" - people tend to take the lowest energy path, so generally accept default choices. The definition of nudging that I could determine from the original book's Wikipedia page includes that, as well as other forms of nudges. I wonder if separating out these nudges by type would result in different results in this metaanalysis. But that is also analogous to p-hacking, isn't it?
I'm not very familiar with how "nudging" is defined in the behavioral economics, and perhaps someone can enlighten me, but personally I find it hard to believe that it can be disproven that the way a choice is presented doesn't play any role in one's decision. The Goldilock principle is well-known. Most people instinctively choose the middle option, when given a three things to choose from.
Does this study imply that choice architecture plays no role in our decisions? Or am I mis-understanding it?
Defaults are one example of a nudge. One of Thaler's examples is having some default 401(K) contribution for new employees that's greater than zero. While I'm sure there are cases where defaults are less powerful than in others, the idea that defaults don't really matter certainly flies in the face of everything I know about the world.
You give another example of choice architectures though I'm not sure if that's a nudge in the literature or not.
Absence of evidence is evidence of absence. It's just not proof of absence.
For example, if I search your entire house for drugs, using drug sniffing dogs and so on and I don't find any at all, that's pretty good evidence that you don't have any. It's not proof though - you might have just stashed them really well.
Similarly, if people have been looking for nudge effects for ages, doing loads of studies on it for years, and none of them have found any effect, then that's pretty good evidence that the effects don't exist. It's not proof though; they might just have not been very good experiments.
Well said. I'd just change your first sentence to "Absence of evidence can be (but isn't necessarily) evidence of absence.", which is more in line with the rest of your post.
OTOH, people have been looking for evidence of nudging, and didn't find it. Since the a priori probability of a more than marginal effect of nudging is unlikely, we can conclude that it's much more likely that nudging doesn't "work" than that it does.
Obama would have appointed Sunstein no matter what book he wrote, because he was one of UChicago's superstar professors (and was at the same time being aggressively courted by schools around the country; he ended up, of course, at Harvard). His appointment was notable not so much for "nudges" as for the fact that Sunstein was probably one of the most conservative appointments Obama made.
Your comment makes it seem as if the Nudge work had qualified him for OIRA, when in fact he was probably one of the country's most obvious lay-up candidates for that role. As I think you know, given his own background, it would have been weirder if Obama hadn't found a role in the government for Sunstein.
He released a book called “Nudge” the year before his appointment. I’m not saying he wasn’t qualified for OIRA without it but “making society better through subconscious manipulation” was definitely pitched as part of his shtick at the time.
Reading this article, it is good that the UK COVID policy wasn't based on behavioural nudging /sarcasm. The UK COVID policy heavily relied on this and one of the unwanted side effects was scaring a certain section of the population into submission. Although that may have been effective during COVID, it made it a lot harder for that segment of society to return back to normal.
Isn't locking down the obvious departure from a "nudge policy"?
I've seen the claim a lot but it all goes back to documents like this one which discusses strategies for communicating to increase compliance with lockdowns.
Yes, the UK government received such shocking insights as "Messaging needs to emphasise and explain the duty to protect others", and "Messaging about actions need to be framed positively in terms of protecting
oneself and the community, and increase confidence that they will be effective".
Of course, the government did pick and choose what to follow, so it would be absurd to say the entire COVID policy was "based on behavioural nudging". The UK's adherence to isolating after positive tests was thought to be one of the lowest of any country. When SPI-B pointed out that financial support would increase adherence to isolating, no reaction from the government. https://www.instituteforgovernment.org.uk/blog/government-su...
The first one is from before the first lockdown. The strategy was completely replaced when lockdown happened. It was unrecognisable from then on.
The second one is about "deploying fear, shame and scapegoating" which the document I linked specifically calls out as a communication strategy with more downsides than any of the others they mentioned. However, Priti Patel just can't resist such activities.
At this point seeing mask usage e.g. outdoors on a hiking trail is a little disturbing, because at this point people are thinking they are fighting the good fight but are now on the other side of the evidence (which says you are pretty safe outdoors or in a big room or while merely interacting briefly in passing with people, as one does with strangers in public). I wonder what the messaging will be given that this supposedly "scientifically minded" mask wearing subset of the population is no longer listening to the science.
I hope this doesn't lead to weakened immunity overall in the population. If you wear a mask every time you go out into the world, that doesn't give you much of a chance to build up acquired immunity to all the other bugs that are out there. There are stories from the early 1900s of native americans coming out of the woods and joining western society. They of course have spent decades in isolation versus just two years, but that's enough for them to end up perennially sick and in poor health when they were actually integrated into western society, and eventually die young of common disease. A lack of acquired immunity is what killed Ishi: https://en.wikipedia.org/wiki/Ishi
If publication bias is the exclusion of publishing results that doesn’t support your hypothesis, how are they taking that into account?
If I’m interpreting this correctly(and I by no means am sure that I am), I infer that they are saying in a fair publishing environment you’d expect to see more results that show less decisive results, therefor the current set of results is likely biased.
Couldn’t this bias also happen in the other direction? It sounds like they’re saying the results are too good and don’t match other scientific patterns of publishing results.
The easiest to understand diagnostic used to measure publication bias is the funnel plot. Suppose the true effect of interest is theta = 0.2. Then the observed effects in studies should be centered at 0.2; some will be higher, some will be lower. Assuming no systematic error, the degree to which study results vary around 0.2 should be proportional to the precision of the study (think sampling error given a sampling design). A hypothetical study of an infinitely large meta-population would produce the effect estimate of exactly 0.2, infinitely precisely. A series of very small studies will likely show quite divergent results, just on the basis of precision.
A funnel plot plots effect sizes on the x axis and precision on the y axis. The most precise studies should be tightly grouped around the meta-analytic average effect; the least precise studies should be spread more widely. This forms a triangular, funnel shape. If no publication bias exists, the spread of studies below the magnitude of the average effect should be comparable to the spread of studies above the magnitude of the average effect.
If there is publication bias, then the points that would form the left (without loss of generality; right if negative effect size) portion of the funnel will not be observable.
There are issues with funnel plots and there are other diagnostics but I hope this provides insight into one of the tools used. Notably, as a diagnostic, funnel plots work whether the true effect is positive, negative, or null; they assume only that the underlying assumptions of meta-analysis are true (that studies represent a sample of the same, true underlying effect -- other diagnostics and corrections exist when this is violated)
Interesting -- the problem may be a misapplication of funnel plots for metaanalysis.
I'm not sure what theta is representing and only skimmed the paper, but especially in social scenarios and across social papers, seems unlikely to assume the same distributions and parameters across tasks & populations. Sometimes comes down to 'is there any effect??' and sometimes a precise notion of effect size in a lucky/clever specific scenario. Likewise, social science is one of the hardest fields to setup a good experiment, and few publications accepts negative results, so mostly only 'good' p-value ranges getting published seems normal. The Wikipedia page on funnel plots shows, afaict, the same criticism of the technique.
Whether about the effect size or how it is reported, funnel plots seem an inappropriate choice for debunking something as general as 'nudges' across heterogeneous studies. Skimming made the metaanalysis feel rather lazy (lack of cross validation, interpretation, ...).
Not my field, but I would have had to do some digging before accepting this metaanalysis in review, and by default, would be 'not ready'.
Both the authors of this study and meta-analysts generally have followup responses and other diagnostics and models that work under degraded assumptions, so I don't think it makes a ton of sense to respond to "this is the absolute most introductory example of why this kind of intuition works" with "it doesn't work in <XYZ> case". There are a variety of ways to think through what data looks like under publication bias or not irrespective of what the "correct" effect would be.
I'd also add that this is a paper responding to an existing meta-analysis, so the claim that it's impossible to get a quantity of interest for a meta-analysis because the constituent studies are being inappropriately aggregated is itself an argument in favour of this paper's rejection of the original paper's finding.
I personally just rejected an m-a at a social science journal on thursday because I thought it suffered from unaddressed garbage-in garbage-out along the lines you mention (non-experimental data, no attempt to pin down causal identification, inadequate qualitative discussion of risks of publication bias, unclear QOI), so I am sympathetic to the criticisms, but just know that there are next steps. :)
it's one thing to reject a meta analysis, and i'm rarely suprised at that. i'm not a fan in general, as i've seen many break down when you dig in, even in seemingly 'nice' areas like whether medicine x works on disease y.
it's another to reject the underlying phenomena because of metaanalysis.
"Couldn’t this bias also happen in the other direction?" The general notion is that positive results ("we tried nudging and succeeded in getting people to behave more in X way") are more publishable than negative results ("we tried nudging, and nothing much happened"). It is a common and well-attested problem in many areas of science, but probably particularly in behavioral sciences; I have not heard of cases where publication of negative results is more likely than publication of positive results, although there are obviously heated debates over some results.
Whether publication bias is the explanation in this example, I don't pretend to know.
There's also an implicit assumption with nudges/defaults that you're nudging people towards a reasonable place for some combination of policy and preference reasons.
But imagine that a company would just as soon not pay out more 401k matching than it has to, so it makes the default zero. (Which of course is often the norm for different reasons.) That's as valid a nudge as anything but we shouldn't be surprised if a lot of people don't go with the default.
We probably also shouldn't be surprised if a lot of people maybe wouldn't go with a maxed out default.
Defaults wouldn't be nearly so powerful if they weren't typically chosen to be fairly reasonable for the average person in the target audience.
But I don’t think the goal is “most people can be nudged” so much as “a nudge is a cheap way to increase a behavior by 5-10% which is probably quite significant in policy circles. Low single digit percentage increase in people who pay their taxes would be huge.
I guess the question would be are positive results missing for another reason, like they are harder to test for, therefor the data looks better than it should because in the aggregate they are better, but yeah this is probably unlikely.
There's a little-known theory (whose name eludes me) that states: any outcome in behaviour is highly dependent on the immediate and unpredictable interplay of various environment variables and their real and perceived effect on the person. this alone cancels the efficacy of any nudges (but some variables may be in the same direction as the nudge--hence the original but misguided nudge research: they were fooled by randomness). I have seen married women who were loyal to their husbands (and who had no idea they would fall for a guy who practiced "seduction" on them) become bewildered and surprised by their own behaviour even though the behavior went against their firmly-held opinions about themselves (that is: I love my husband and I am loyal to him.). The environmental variables used by the person (who is a marketer) were too strong for their opinions to hold out against. As an example, he would invite them to his studio, which he had decorated (and clean) and made it so homely and snug and comfy that the first lines of defense were broken before they had a chance to realize what trap they were in. The marketer also tried to brainwash me (but failed) because I knew the power of variables and this knowledge alone saved me--even though he seemed irresistibly charming in the capacity of a father figure I never had.
Wait until you start reading about genotype by environment interactions and realizing the implications that has on just about everything in biology and society
The issue with this kind of meta analysis is that, as the author, you get to decide what the groupings are. No two studies will be identical, so you can invent bucketing strategies until you find that some buckets have the results you want and focus on those.
In addition, they don't seem to have shown that the technique they're applying actually works for modelling the distributions that they're analyzing.
Keep in mind that the people who publish a correction like this have a lot to gain from getting a lot of attention.
They succeed when lots of people say “what?! This broadly accepted idea is wrong?”
It’s the equivalent of a Buzzfeed headline, even if backed by thoughtful research. The new research may be correct in invalidating the prior experiment’s evidence but the reality is that we all know the “nudge” idea is useful at a practical level. If I ask my two-year old son if he wants milk, the odds that he is drinking milk 5 minutes later skyrocket. The same principle applies to people making all kinds of choices - from buying insurance to picking a University.
And boy am I struggling - I am amazed it's even possible to group all of these studies under the same umbrella unless that is "misc".
Claiming that how people choose to treat their cancer, portion sizes at restaurants, rural Kenyan maternal health and Dutch childrens vegetable choices are even in the same field seems - incredible.
Maybe I am agreeing with the study in a roundabout way. If all of these things are under the heading "nudge" then it is too broad a heading. It's probably impossible to say one way or another that nudging works because you can never unpick all the confounding factors. Did the Dutch children have a popular TV show about vegetables while the Kenya media ran months long articles about unsanitary hospital conditions?
With my cynical hat on Nudge is a way for politicians to try something even when the real fix is intractable. I don't oppose "do something positive" - Injust oppose abusing power, violence for political gain, and all the other reasons why we can agree on a nudge in the right direction but cannot agree on a structural fix in the right direction.
I guess if they worked then they would solve the problems without structural chnage and so would defeat the forces that benefit from the status quo. so yeah. it does not work.
looks like we will have to go back to the old politics and revolt.
I'm one of the authors of the reply and it was very interesting reading so many diverse thoughts and comments. I would love to respond to all of them, but it would take ages. Luckily, Stuart Ritchie (@StuartJRitchie) wrote an awesome post on his substack (https://stuartritchie.substack.com/p/nudge-meta) that goes much deeper and adresses many questions and the fair critique raised here.
Also, note that there is only a limited amount of information and nuance you can comprise into a strict 500 words reply limit in PNAS, which is the reason we focus only on one aspect of the original meta-analysis -- publication bias.
Anyone who has ever run an installer, that has an annoying setting hidden in the default-toggled-off-advanced section, knows that nudges do work. Perhaps nudges do not work on you, but when you see your parent's computer with all the desktop icons and browser toolbars in days gone bye, you will see the evidence of them working.
This is why I think nudging must work on some level. What are trends, advertising, fads, etc. if not nudges? Is the existence of an intelligent nudger required or can the hive nudge itself?
I'm just in the process of reading 'Nudge - the final edition' - definitely worth a read - it's thought-provoking, funny, insightful and enjoyable. It would be a shame if it all turned out to be bullshit as the examples given in the book seem straightforward and plausibly effective.
> A newly proposed bias correction technique, robust Bayesian metaanalysis (RoBMA) (6), avoids an all-or-none debate over whether or not publication bias is “severe.”
Absence of evidence doesn't mean it's not true. It doesn't even imply it.
it seems like what they are saying is here is some statistics that suggests that maybe there’s a bunch of missing data that would
Show that nudge doesn’t have an effect. Like most science it seems like the conclusion should be, go do more research to see if we’re right, not, you should conclude that nudging doesn’t work.
They're implying that some such research has already been done, but wasn't published because of publication bias against negative results. So the data exists, but is missing from publication. (I suspect it's also possible that more skeptical researchers have been dissuaded from even doing the research, for fear they'd be wasting their time and research budget because after they did the research, it wouldn't get published.)
Not a great journal if you are trying to publish something with potentially large import. Its reasonable to guess that something is seriously wrong with the study to not get it in a good journal. This publication does not move my opinion on the matter.
Whether 'nudging' works or not, the concept is unacceptable to me.
First, the term 'nudging' is a misnomer. Let's call it what it is - manipulation. Manipulating the options or defaults to some other set in order to achieve a better outcome for someone...
Well, who is that someone? The government?
Who says that their values align with mine? I wouldn't have responded as the government did to the pandemic, but their nudge units went into overdrive nudging people into vaccinations, etc. Is preventing access to bank accounts for protesting government actions (as in Canada) a 'nudge'?
Can I challenge the promoted values? If the state apparatus has its own values and agenda, how do I get to state mine - where is the values/ethics discussion being had, and how do I get my say? I find the promoted values Orwellian, communistic, overly progressive - one for all, but not all for one... is that opinion fair to hold? Or must I be nudged over the cliff?
Aren't we really just talking about soft-sell authoritarianism here? Weren't we just meant to vote for people, not have a perpetual nanny state guiding us?
Your vote says their values align with yours, or at least most people's. Governments have to do things all the time that step on a few toes for the overall good.
What about all other public health campaigns? Drink driving, cancer screening, anti-smoking? Not everyone will want what they're pushing but we mostly agree that's a good thing for people, and that's why we let the government promote those things.
I can’t help but see social science as humans attempting to modernize memory of imperialism and religious belief embedded by prior experience.
Think of how popular it became as a field in the last 50-100 years as the populace became less religious. The US adult population recently crossed a threshold where <50% believe in higher power now.
No science gives social scientists higher powers of forecasting human future, yet we took the ideas and applied them with the same conviction some believe in gods, in the same way; a network of randos spreading their gossip, wrapping it in technical jargon biased by past ignorance.
Consider how much of this work was being leveraged against an ignorant public with no opt out button, via print and TV. How is that informed consent?
Social media comes along, upends those forms of media, creates a new meta awareness we lived in a society policed by high minded but normal people. That awareness means we can opt out of being influenced by intentional nudges, same as we opt out of believing in intentional nudges to abide higher powers.
Social science “worked” when the masses were unaware it was happening to them. As the public has become more aware of how it works, it’s all Soylent green; just people.
I like to provide that HN community with some context as to what this means.
There are some 300 “research” departments in each of the major social sciences: psychology, sociology, economics and anthropology. If you believe what they say, about half of their mission is teaching and the other half is research. That’s a lot, tens of billions of dollars.
The nudge findings were among the few to not only reach the level of public knowledge but, more importantly, directly influence on public policy. To use the one I most familiar with: the so called default for defined contribution retirement plans, eg 401k. These government regs assumed, for good reason, that maximizing contributions was in the public interest. Based on the nudge findings, after much debate and effort, they were updated to dictate that the max options forms was pre selected in the brief it would cause more individuals would opt for that as opposed to contributing zero.
So far so good, right? In fact nudge has become a canonical example in introductory public policy courses as to how their research can in some sense make things better.
This meta-analytic finding turns on the authors’ method for measuring publication bias. Because I accept that, I must believe that this entire body of research, probably the signal behavioral economics work, is essentially worthless! Thus, all that effort has not only been wasted but the credibility of social science in general is damaged.
Adding this to the well/known gamesmanship in peer review, debate over tenure and etc. means it’s past time to reform a large chunk of academia.