Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Replication Controversy in Psychology (slate.com)
73 points by nkurz on Aug 2, 2014 | hide | past | favorite | 66 comments


Now, I'm just a layman, so I could be completely off the mark about this, but this is how it seems to me.

Doesn't every single generation of psychologists pretty much completely reject the findings of the previous generation? All of Freud's stuff is now considered total bullshit. All of the stuff psychologists said in the 50s about homosexuality being a mental illness is now considered bullshit.

In other fields of science, like chemistry or physics, this has never happened since the introduction of the scientific method. Scientists have been wrong lots of times, but never has the majority of the body of scientific knowledge been thrown out. Usually the changes are something small, e.g. "Einstein was wrong about hidden variable theory in some respect." Even the most drastic changes in scientific thought are usually just the narrowing down from a number of existing theories to a smaller number of theories. In most fields of science, we seem to converge on the correct result over time, with very small deviations.

Psychology doesn't seem to be converging at all. Am I wrong? If not, why is this?


Psychology is very young branch of science (Freud lived less than 200 years ago) and its subject matter is vast and complex: from low level neural mechanisms to perception to higher cognitive functions to mental illnesses to human cognitive development to behavioral psychology to decision making. Take, for example, schitzophrenia and bipolar disorder. Even if we don't understand them at the lower level and don't properly understand their treatments, we have become much better at diagnosing them early enough to help people from ruining their lives. It's a huge step forward.

Or at the lower levels, consider human perception systems. We start to understand how early layers of vision work, even if we don't yet have a good grasp of how higher levels of visual cognitive system work.


As a fun aside: Freud died in 1939, just 75 years ago. I know HN's demographic skews young, but I'll bet there are people reading HN who were alive at the time.


> All of Freud's stuff is now considered total bullshit.

This isn't true. Freud's enduring contribution to psychology isn't stuff like penis envy in women or interpretation of dreams, but that thing which you now likely consider a fact: recognition of the existence of the subconscious.

The impact of the subconscious, and the impact of life psychological history on later behaviour (and the fact that recognizing that impact of personal history can assist treatment) was not a recognized process before he promulgated it.


You shouldn't confuse the subconscious with the unconscious. The former is part of Freud's legacy and rejected by most mainstream psychology, while the latter is widely recognized as an important concept.


Fair enough; a confusion of terms. But isn't it still true that Freud was the first, or among the first, to propose that the unconscious mind influences the conscious one? And also that one can learn of someone's unconscious processes by conversing with them?


But a whole lot of scientists (including psychologists) consider the subconscious total bullshit.


The problem with your premise is that all of Freud's stuff isn't considered total bullshit. The mass media, science magazines, and materialists like to present it that way, but psychoanalytic and psychodynamic traditions (depth psychology) are still alive and well. It is, perhaps, not quite as prominent in experimental psychology (which I'm guessing is what you're talking about), but it's certainly still a force in clinical work. Humanistic, positive psychology, cognitive-behavioral, gestalt, and most other approaches were inspired by, informed by, and still use parts of depth psychology in their approaches.

I think you're mixing up clinical psychology and experimental psychology. The aims for both will vary, and I tend to agree with you that presenting psychology as a hard science is ludicrous (but unfortunately the only way to get funding). You're also right about the political component of psychology (your homosexuality anecdote); psychology can be used as an instrument by the leaders of society for political, social, and economic gains and control. That doesn't mean it has no contribution to man's knowledge of the world.

Just because the newer approaches appear to reject older theories does not mean the older theories aren't useful. Newtonian mechanics explains phenomena quite well for everyday living. Relativity was a revolution in the field but there's a still a place for Newtonian physics (i.e. you don't have to erase the classical theory). The way knowledge evolves in these long-standing fields isn't as mechanical as I think you're claiming it to be. Science is much more complex than that.


I don't think anyone is denying the overarching influence of Freud which is enormous, but popularity doesn't confirm or deny validity of his claims. Astrology is popular, but no one consider it to be science.


Freud wasn't a scientist, he was a philosopher and clinician who talked about things that are now considered part of psychology. If you limit things to psychologists who were clearly scientists then things have pretty much been converging. There have been a few big shakeups particularly with regards to the rise and fall of behaviorism, but it is a bit of a stretch to call them "pretty much completely reject[ing] the findings of the previous generation" and similarly large shifts have occurred in other fields (c.f. phlogiston theory).


I'm guessing here aswell but I think the greatest minds are more captivated by solving the puzzles of the universe than poking and prodding the human mind which is much more tedious and doesn't require that level of imagination / abstraction. Neurology might but lack of tools prevented significant advances.

So the smartest people aren't doing it and on top of that far fewer people in general go into it. I have no idea about this but maybe there's fewer grants for psychology and so fewer people are encouraged to go into research?


An additional problem with research into things like social priming (and the attempts to replicate it) is that they all use a sample population of American undergraduates.

This paper goes into interesting detail about why this is a terrible methodology: http://hci.ucsd.edu/102b/readings/WeirdestPeople.pdf


Using subjects from the same population should make replication easier. If the research was only valid for westerners, it would be a step up.


Thanks. I've been wondering for a while: How much do we know about human psychology is based on studies performed only on 20-year-olds?


...well, quite a lot actually, because 20 year olds are more like other humans than different.

Humans don't have life stages. We don't wrap ourselves in cocoons and change from grubs into butterflies from ages 16-20. For most of human existence, 20-25 year olds were the substantial power holders.

There's a branch of psychology called developmental psychology. It has to do with how humans change over time. If developmental psychologists only studied undergraduates, we wouldn't have a good grasp of the area, but they don't. It's pretty easy to get kids and elderly people to study. It's even pretty easy to get adults to study. It's more expensive than undergraduates, but it's doable.

Beyond that, what sort of differences do you expect to see in areas like, say, input attention, working memory, fluid intelligence, social behavior, perception, motor control, etc.? 18-22 year olds are pretty much normal adults in all of those regards.

The thing that makes this really insignificant is that we know so little about how minds work. Psychology has only really existed since Skinner and behaviorism made it a real science. If I spent 10 seconds I could think of nice, clever-sounding retorts as to why the perception system of a 22 year old is vastly different from the perception system of a 32 year old, but in reality we know almost nothing and learning anything is good. As Heinlein said, it's wrong to think the world is a sphere, but it's much better than thinking the world is flat.

These discussions about replication always sadden me, because they miss the point by so much. Psychology is one of the least respected and yet most vital sciences. The problem is that everyone is a lay expert, because everyone has a mind and thinks they understand how minds work. Psychology has taught us so many valuable, horrible and beautiful things about ourselves.

If you're reasonably intelligent, you can come up with reasons not to believe anything. It's easy to discredit things. It's hard to build things.

(It's also false that psychology studies are only performed on undergrads, usually undergrads are a good pilot testing base and then you move on to externally recruited populations, unless there's really no reason to do so based on the field.)


If you're reasonably intelligent, you can come up with reasons not to believe anything. It's easy to discredit things. It's hard to build things.

This comment is remarkably troubling to me. There is intrinsic value in replication studies. Showing that an effect does not exist or is far weaker than (or, alternatively, exists and is just as strong as) initially reported is just as much "building" our understanding of scientific truth.

No, scratch that, it is far more important than the original study. The original is more like a sketch, and as the results are confirmed or disconfirmed over the course of many subsequent studies the solid building takes its true form.

These discussions about replication always sadden me, because far too many researchers miss the point of the scientific method.


...you're responding to a point I didn't make. That comment is about the fallacy of psychology "testing everything on undergrads."

Please read what you respond to.


I apologize if I misrepresented your point. I went back and re-read your comment again, and I guess I'm still confused. It looks like you pivot in the middle of your comment with the statement: These discussions about replication always sadden me...

What did you mean by you can come up with reasons not to believe anything regarding testing on undergrads? That sounds to me like a broader complaint about replication generally.

How is it that discussions about replication "miss the point"? I thought the point was that replication is fundamental to science, so it seems like these discussions are usually spot on.


This may be a "get off my lawn" comment, but my understanding is that pediatric psychiatrists and psychologists will treat patients up to 25 years old. The boundary between childhood and adulthood is labile and varies from person to person. From my own experience, re-reading books like "100 years of solitude" or "the unbearable lightness of being" at ten year intervals shows me how much I have changed


...but the optic nerve of a 20 year old is the same from a scientific perspective in almost all respects to the optic nerve of a 30 year old. Maybe if that optic nerve signaled some horribly deep work of literature like 100 years of solitude, the older brain behind it would interpret it differently, but that nerve is the same.

When you remember the amazing swelling feelings of being ~adult~ you had when you last read 100 years of solitude, your semantic and episodic memory systems are, functionally, identical to what they were when you read it the first time. Maybe your fluid intelligence is a little less fluid, but that doesn't really start until you've aged a lot.

I'm sure you've changed as a person. I'm sure you have a wealth of new experiences and are entirely different from your past self. This is irrelevant to the physics, chemistry, and psychology of your body.


In general I agree with you. It is important to separate experience from capability (there's probably a better word here).

Much of the social psychology may not fit easily into that division though. To pick on a common whipping boy in these arguments, Milgram's obedience study, what is the role of experience in any set of choices like those given to the participants. The asymmetric responsibilities when there is a power dynamic is something that some people learn somewhere along the line.

I'm rambling a little but I feel that some of the observed effects in these experiments may be things for which experience may provide a learned immunity


Milgram's obedience study was tested on adult men, between ages 30-40 IIRC. There were 19 later experiments, you wouldn't call them replications because they weren't exactly the same, but they were all consistent, and they sampled diverse (for the 50's) populations.

If you didn't know that, and were assuming it was tested on undergrads exclusively, you should recognize that your assumption was wrong, and propagate that through your belief graph.


I agree that human brains co tinue to develop through the 20s.

In the UK CAMHS or CYPS (Child and adolescant mental health services or children and youn people services) have a hard cutoff at 18.

Someone who is 17 years 11 months old will be placed into a CYPS team and a CYPS in patient unit if needed, and someone who is 18 years and 1 day old will go to an adult team and adult inpatient units.

There are considerble problems at the moment with:

Transitioning from CYPS to adult services

Having enough inpatient beds for young people.

Young people either have to travel hundreds of miles to get a bed or they have to go into an adult unit. That's a serious breach of commissioning and regulators get involved.


Except that generations differ from each other in both behaviour and values. Moreover cultures differ from each other in both behavior and values.

When you make a test on American 18-20 years old, you may be learning something universal about humans or you may be learning something special about products of current American educational system. Perception and social behavior of current American 18-20 may seem universal to someone who is American 18-20 or close to them, but not necessary when you come from different culture.

Many of those studies try to measure how much you are influenced by something you read/heard ten minutes ago, how much likely you are to cooperate/compete in some special situation etc. Are you claiming that these reactions are necessary universal?


We don't have discrete life stages like butterflies, but my understanding of developmental psychology is that we very much do have mental-ability life stages. Part of that understanding is that we don't fully mentally mature until around 25. If that is true, then it does call into question using mostly 20-year-olds to conduct psychology experiments.


I was very surprised to not see Jason Mitchell at Harvard Univeristy's attack on the concept of replication: http://wjh.harvard.edu/~jmitchel/writing/failed_science.htm

The thing is amazing to read, in the "someone who thinks he's a scientist actually wrote this?" sense, like this gem:

Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way.


Wow, this is quite stunning to read. He actually baldly states that the only reason for experimenting is to confirm what you already believe, and that you should therefore just do as many experiments as it takes to 'prove' yourself correct, hide evidence of the failures and then stop - forever.

This was linked from the article, BTW - and the responses they also linked to are dead on:

http://mchankins.wordpress.com/2014/07/12/do-not-replicate

http://vserver1.cscs.lsa.umich.edu/~crshalizi/weblog/1102.ht...


That statement reveals a stunningly large unconscious bias. Let us assume that Study A produces result X, and Study B failed to reproduce it. Let us take for the sake of argument an assumption that one of these studies is flawed.

Which one is it?

You can not tell.

There is an old saying: "A man with a watch knows what time it is. A man with two watches is never sure." - http://en.wikipedia.org/wiki/Segal%27s_law


I remember reading that and reeling, shocked that anyone could write and sign their name to that. Whenever I am pessimistic about my chances in science, I remember that that guy has a job at Harvard.


He is a scientist; unless you're also a domain expert in his field with at least as much expertise as him, or a newcomer with radically impressive work, you can't just not call him a scientist because he disagrees with you. If a domain expert disagrees with you, it is probably because you are wrong. That's just the base rate and it's probably true here as well.

Traditionally, replication hasn't been done in psychology because the experiments have to be set up in really clever complicated ways in order to tease out effects. Even the most dedicated replicator can't ever fully replicate a psychology experiment. You'd have to ship the actors used as confederates across the country to do that and of course nobody does that. It's foolish to think any psych study has ever been replicated.

The way psychologists deal with this is the concept of converging operations. This has to do more with being clever and considering implications of a thing you're trying to study with respect to other theories you have much more evidence for and in different situations. If you think stereotype threat is a thing, for example, it would make sense for stereotype boost to exist if the stereotype is positive. If a priming effect speeds up the recognition of "doctor" after a subject has been shown "nurse", maybe recognition for "warmth" will be slower after a subject has been shown "frozen". If people only encode based on schema variables, they'll be unable to remember information unrelated to a schema even if they're provided with congruent encoding cues.

Converging operations is just studying the same concept from different angles and seeing if you keep seeing consistent results. If the results aren't consistent the hypothesis you're investigating will, in fact, die. Psychology is an advancing science as of right now. If people were just publishing random noise there'd be no reason for concepts to fall out of favor.

I think people on hacker news tend to wildly underestimate the degree of skill involved in experimental psychology. So much of statistics, even, has come out of psychology because of the need to refine and clarify what effect you're seeing.

Alternatively, if you think because of this existing academic psychology is lacking, you're in a fantastic place to be. Do literally anything to get some seed money, just working in a standard SDE job would be fine, and then run some small experiments (and replicate them, of course) on learning or addiction. Then write the next Farmville or build the next Facebook. If there's really that much low-hanging fruit, go out and pluck it. Put up or shut up.


Traditionally, replication hasn't been done in psychology because the experiments have to be set up in really clever complicated ways in order to tease out effects. Even the most dedicated replicator can't ever fully replicate a psychology experiment. You'd have to ship the actors used as confederates across the country to do that and of course nobody does that.

Now wait a second. When a psychology study comes out, it claims "This experiment shows stereotype thread reduces performance of Honduran men on a Math test." Such studies rarely claim "This experiment shows stereotype threat reduces the performance of Honduran men on a Math test when Jill the skinny experimenter repeats the code words."

If using different experimenters yields a different result, it means the effect being described is probably not as robust as the experimenters claim. The cause might be Jill's shifty eyes rather than stereotype threat.

That's a replication failure, in the sense that it shows the claimed effect is far weaker (or causally different) than the original study claimed.


I would say that it's typically assumed that there are confounding factors in any experiment. An experimenter has to try to minimize that, but you're still not going to be able to replicate a full experiment.

If you have more than one researcher administering a test in a stereotype threat experiment, that'll reduce the likelihood of that being the confound. If other studies (not replications, but their own experiments, with their own, different approaches to study the problem) agree with the first experiment, the effect is likely real.

I would also say that you're viewing the entire system far too antagonistically. Nobody goes into academic psychology to get rich off of it.


But if you're not minimizing the confounding factors enough that the result can be repeated, your experiment failed. Your hypothesis has neither been provably confirmed nor denied. At that point you are, at best, still gathering evidence, and publishing results would be an error.


>Then write the next Farmville or build the next Facebook. If there's really that much low-hanging fruit, go out and pluck it. Put up or shut up. //

Your reasoning here is as fallacious as the idea that unrepeatable experiments are good science.

Your psychological hypotheses on addictive mechanisms can be entirely false and yet your game can still be addictive. Maybe the Skinner Box-like environment you create has a particular applicable demographic outside which it is largely impotent; but failure to repeat the experiment isn't seen as contradictory to the hypothesis and so you fail to address different markets with the specific sort of addictive elements that suit them best. You'd be leaving a heap of money on the table [for shame /s].

The ancient Greek idea that gravity worked by the attraction of Earth-kind substances to the Earth itself was falsified and yet things still fell down.

Would you say that a single superstitious action (finger crossing, say) coinciding with the desired outcome proves that finger crossing works? Do you think that those who ascribe to such a progression of logic as demonstrating sound scientific thinking, are being scientists?

IMO observations and experiments that can't be repeated are of the utmost importance in furthering knowledge. Truthfully reported anecdotes are valid data. But unrepeatable events have limited scientific value in and of themselves, they can not show statistical significance nor demonstrate that a hypothesis should be held to be true as an objectively valid scientific theory.


You're conflating a number of ideas, but the overall trend in your post is towards a debunked theory of science called positivism. You should research that.


Is your contention that [social] psychology need not rely on an empirical approach. How then are you to claim it's "scientific". Perhaps the best question to ask here is what do you consider, succinctly, to be the basis of the scientific method?

I did indicate that I'm happy to accept other methodologies as beneficial and am certainly not wedded to positivistic approaches in all fields of human understanding. Are you perhaps pushing the meaning of "repeatable" in the current context beyond the bounds of "supporting the same breadth of conclusion with equal confidence".

It is however not scientific to rely on unrepeatable experimental results. If an experiment is unrepeatable then the conclusions are proved false. Note that the question at hand is not whether the initial experiment is valid or useful. Science after all is axiomatic and the world, generally, non-deterministic so far as one can tell [though I know there are many dissenters on this point].


Well "pbhjpbhj," like I said, you're confusing logical positivism with science. Positivism, the idea that we can only believe propositions which are proved correct, lost favor in the late 1800's. Modern science, including social psychology, is in the vein of scientific theory called operationism. Like I said before, you should research this, because you don't know what you're talking about.


We're not talking about multiple metrics, the exact same metric is being used in empirically identical circumstances and the results are conflicting. The only way to square that position is to assume the conclusion is wrong or the data is wrong. Essentially the epistemological underpin isn't important; operationalising a system no more allows for unrepeatability in empirical data than does any other consistent scientific method - after-all if unrepeatability were allowed then the logical basis would be self contradictory.

Rather than arguing from abstract implications about my assumed knowledge why don't you address the question at hand?


Don't conflate Experimental Psychology as a whole with the subfield of Social Psychology, or the particular social priming community. While it is true that the "skills" argument and "conceptual" replications are well-established methodological pillars within this subfield, they are far from standard outside of it.

This is why this string of multiple replication failures and fraud scandals have turned the spotlight to these practices from within the discipline, with reasonable and high-profile people openly questioning their validity and pushing for a whole "replication movement" [1-4]. So it's not just HN commenters, you know.

[1] http://www.nature.com/news/replication-studies-bad-copy-1.10...

[2] http://chronicle.com/article/Power-of-Suggestion/136907/

[3] http://www.psycontent.com/content/311q281518161139/fulltext....

[4] http://www.nature.com/polopoly_fs/7.6716.1349271308!/suppinf...


I wouldn't say that converging operations is limited to social psychology. Virtually all memory research (cognitive, not neurological) is conducted in the same way. As various other areas get more neurological basis I think psychology will move away from it in favor of more precision measurement, but it's still valuable.


Absolutely. But outside of Social Psychology there are strong, genuine disagreements --to say the least-- on whether "conceptual" replications are a valid source of converging evidence at all. See for instance [1,2] for typical reactions.

[1] http://neurochambers.blogspot.com/2012/03/you-cant-replicate...

[2] http://psycnet.apa.org/journals/gpr/13/2/90/


Ioannidis' "Why most published research findings are false" (http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fj...) is particularly relevant here, I think.


Note, in particular, his "Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. "


This covers all issues surrounding reproducability pretty well. It is astounding that scientists want to hide behind their expertises when their results and conclusions are being tested. Biologists who are under a similar cloud tried the same tactics too. They have some secret sauce that make their lab and experiment special and unique, beyond the comprehension of mere mortals. Of course, as publically funded scientists they have a moral if not legal obligation to disclose their "secret sauce" if that is very important. Why publish a paper but leave the public in dark about the most important factor? At best it is an oversight, at worst it is dishonst.

Experiment and reason underpin the modern science. Common sense says inreproducible experiments call their colusions into question. Scientific experise is no escape from reason. A fancy degree from a "top" university does not make infallible anyone. Well controlled experiments are the final arbitrator.


Am I being naive to think that scientific papers should be written in a way such that people outside the area of expertise can reproduce it? You shouldn't need to be a psychologist to run a psychology experiment. You shouldn't need to be a physicist to run a physics experiment.


As much as I hate that sceintific papers don't give good introductions to the public, specialists do have their values. An educated, intelligent reader has limits when it comes to frontline research. It is like asking someone flying a 787 jetliner without touching an airplane before. Prior knowledge and experience are important.

What I absolutely reject is false exclusivity that some scientists hide behind. A professional statistician can spot statistical errors even if he or she never did any cancer study before. A rational person can discern the trustworth of a scientific report if enough details and background are well explained. I always find that popular press lacks a degree of healthy comptempt for specialists. It actually has improved somewhat recently. Still sometimes newspaper reports read like press release.


I accept that there is some skill and training required to run an experiment - particularly with hard sciences - but generally far less than is required to design it and analyze the results.

To extend your metaphor, it would be like saying that only people who design jetliners should be able to fly them.


Yes. You are being naive. You could reproduce rote aspects of the experiment. Not even all of them, you don't know how to calibrate some expensive lab equipment, its not the paper's job to explain that. And lots of things like conclusions and analysis take expertise. It's not about "being", its about having expertise. Papers say because foo, we conclude bar. If you have almost no idea what foo is, you just can't write that. It's not the paper's job to give you an introduction to physics or anything.


Controversies like these make me glad I work in the field of programming...while arguments over what algorithm or frameworks work best are less sexy than what the sciences cover, at least with narrow claims, it's easier to concretely argue for one or the other...and then of course, if we're talking about open-source, then the errors of ambiguity are even scarcer.

It seems like a large part of the problem, at least in these psychological experiments, is that there isn't a structured/uniform way to describe preconditions, methodology, assumptions, and measurement practices...OK, so you used 40 undergraduate students for your original experiment...how did you pick them out? How were their characteristics accounted for (e.g. age/gender/major/health/etc)? Did any of them ever see the Trainspotting scene before? Have any of them ever seen a movie of equal grossness? How many minutes did you wait after showing them the Trainspotting scene? In what order did you hand out the questionnaire? How did you seat the students? etc. etc. etc. etc.

A totally honest researcher might have trouble enumerating and enforcing all of the different controls, never mind communicating them to other researchers. From the research papers I've read, the ability to communicate these facts doesn't seem to be of uniform quality.


"...while arguments over what algorithm or frameworks work best are less sexy than what the sciences cover, at least with narrow claims, it's easier to concretely argue for one or the other..."

Very few arguments related to programming are over algorithm choice, they are almost always over engineering concerns: readability, platform/language choice, frameworks, etc. Those, in turn, are often dependent on a mix of personal taste and predictions about the future (e.g. directions for the product architecture).

So I don't think programmers are quite immune from non-scientific analysis. That's apparent from the longstanding programmer debates about basic things like compile-time type checking. Even narrowing the claims rarely seems to reach much consensus.


> Controversies like these make me glad I work in the field of programming...while arguments over what algorithm or frameworks work best are less sexy than what the sciences cover, at least with narrow claims, it's easier to concretely argue for one or the other

Yeah, it's pretty ironic when the field with (Comp) Science in the name isn't really a science but math.


It is a science. A formal science. Same as math and logic, for example.


Hal Abelson disagrees with you. "Computer Science is a terrible name for this business. First of all, its not a science. It might engineering, or it might be art. But we'll actually see that computer -- so-called science -- actually has a lot in common with magic."[1]

1. https://www.youtube.com/watch?v=zQLUPjefuWA


Add to that list Fred Brooks (IBM System/360, Brooks' Law): http://www.cs.unc.edu/~brooks/Toolsmith-CACM.pdf

Relevant quote: "... the scientist builds in order to study; the engineer studies in order to build. ... I submit that by any reasonable criterion the discipline we call 'computer science' is in fact not a science but a synthetic, an engineering, discipline. We are concerned with making things, be they computers, algorithms, or software systems."


Robert Sedgewick would disagree with Hal Abelson.

Just because some people tend to use pure maths to analyze the run-time of algorithms, doesn't mean we can't apply the scientific method to the problem. Here's a nice presentation by Sedgewick where he shows how we can apply the scientific method to programming: http://www.cs.princeton.edu/~rs/talks/ScienceCS10.pdf

And quote:

> Algorithm designer who does not experiment gets lost in abstraction

> Software developer who ignores cost risks catastrophic consequences


Certain parts of software engineering research are moving in this direction, psychology as a the primary toolbox for research.

Not sure how I feel about that. As far as science goes, Psychology isn't exactly the epitome of rigor. On the other hand, the problem really boils down to asking hard-to-investigate-scientifically questions, mostly about humans.


If the standard in Psychology research is the so-called p-value being less that 0.05, it means that they publish results whenever the chance of the findings being a coincidence is less than 5%. It stands to reason that 5% of the published results in Psychology will be coincidences and not real.


It is actually much much worse. Let's do the math.

1,000 non-tenured professors do 20 experiments each. On average, each of them has one p<0.05 result that is actually a coincidence. They all rush through publication in an attempt to get tenure.

Unless you have reason to believe some results are actually plausible, it's possible ALL published results are wrong.

The 5% is the ratio between "experiments done" and "results that are not meaningful yet randomly deemed worthy of publication". It has no relation to the number of "results that are actually meaningful", which may outnumber the former 100:1 ... or, more likely, be outnumbered by them 100:1.

I wish Neyman/Pearson theory was taught more widely, or at least its implication to the false-positive/false-negative tradeoff. Alas, that would mean a lot less published results and a lot less bragging/tenure rights.


It is even worse than that when negative results are not published. This is because a study can keep getting repeated by different parties until they get that p-value. In other words, if you never publish a negative or inconclusive result (or stop studying once a conclusive result is published) then given enough time you /will/ get a positive result for every single plausible hypothesis.


That's exactly the opposite of what p values mean. Other replies addressed the publication bias problem, but even absent that, p < 0.05 usually implies a probability much higher than 5% that any single "significant" result is false:

http://www.statisticsdonewrong.com/p-value.html

The p value describes the probability that a nonexistent effect will falsely be be called significant. You can't turn that around (without Bayes' theorem and a prior) to calculate the probability that the effect is nonexistent.

To compound this, most studies are underpowered -- they don't collect enough data to detect the effect they're looking for. So a statistically insignificant result usually does not mean the effect does not exist.


There is a great (cheesy) explanation of the statistical 'significance' of p-values: http://theconversation.com/the-problem-with-p-values-how-sig...


My limited understanding is that the simplistic formula for computing p-value assumes certain things about the lack of hidden correlations and effects. If those things are present and correctly taken into account, then the correct p-value would be much higher. Arbitrarily multiplying reported p-values by 10 would not seem to be too far from the mark.


Maybe the answer is to perform distributed research by default. Design experiments and have them conducted by unrelated teams in different locations.


That's a great idea but you really only need half of it. Design the experiments in enough detail that unrelated teams could conduct them, then publish the results no matter the outcome. That's probably enough to catch most of the loose research out there.


I was astounded to read:

> [Psychology] is actually leading the way in tackling a problem that is endemic throughout science [replication]

"Throughout science" presumably is meant to apply to everything including physics, biology, etc. In fact, the article explicitly recognises "Failures to replicate" in several other fields (medicine, biology, observational astronomy) which seems to contradict itself - it's a big special thing when a psychology journal publishes replications, and they are leading the way! Except here is a load of uncontroversial non-replications from other fields.

Perhaps I'm just being spoilt - being in High Energy Physics, almost everyone I work with has an understanding of the statistics involved, and extremely skeptical outlooks on everything that is anything less than perfectly rigorous - along with (usually) multiple separate experiments all measuring (or capable of measuring) the same thing.

One big example I can think of was DAMA: http://en.wikipedia.org/wiki/DAMA/NaI - found evidence for dark matter, and a successive experiment by the same team http://en.wikipedia.org/wiki/DAMA/LIBRA has found the same result - but nobody "believes" the result, until it is measured by somebody else, even though nobody can find a cause for the false positive (assuming it is false).

Finally, I link (again) the Feynman 1974 Caltech commencement address, where he actually talks about this reproducibility issue: http://neurotheory.columbia.edu/~ken/cargo_cult.html


As long as an academic researcher's personal career progress (and pocketbook) depend on a seemingly never-ending flow of "novel" and "significant" findings this will continue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: