Expert analysis informs the decisions we make as leaders and managers — and in our everyday lives. We can’t see red blood cells, but we trust scientists who say we have them and doctors who order blood tests to count them. We suspect that cognitive biases affect our choices, not because we have done the analysis ourselves, but because we believe social scientists who conduct experimental research. Much of our knowledge is ultimately garnered from the testimony of teachers, mentors, colleagues, and authors who write for publications like this one.
But we also live in a world where, almost daily, some expert’s previous certainty is discredited by new analysis. Diets once thought to be foolproof are ridiculed; management practices once decried are suddenly praised. So how should we treat the next piece of advice we get from a scholar or a consultant?
Philosophers of science, who study this issue, generally recommend that we simply trust what we hear from well-credentialed people who seem competent and sincere. But I think we can do better. We should always think critically about what we hear or read.
In my experience, “fresh eyes” often find errors that have eluded expert minds. We owe it to ourselves to handle each item of expertise the way we would a piece of fruit we’re about to buy — gauging how wholesome and ripe it is. Here are my thoughts on how to do that.
Dare to Doubt
In the second most popular TED talk of all time,1 social psychologist Amy Cuddy tells us that holding certain physical postures boosts our power hormones and makes us more courageous; however, attempts to replicate that result have failed.2 European governments chose to adopt austerity policies in part because esteemed Harvard economists Carmen Reinhart and Kenneth Rogoff told them that high debt levels cause a sudden drop in economic growth.3 Then a graduate student, Thomas Herndon, discovered that their claim was influenced by an Excel spreadsheet error.4
Experts fool themselves all the time, especially when a problem is messy and the analysis is difficult. The replication crisis — whereby scientific findings are increasingly being revealed as tough to reproduce — is plaguing psychology, economics, and medical research.5
No one really knows the extent to which empirical findings can be trusted, but some people have tried to guess. Stanford professor John Ioannidis argues that most medical research results are false.6 Economists J. Bradford DeLong and Kevin Lang make a similar claim about the field of economics.7 In a Strategic Management Journal article, my coauthor Brent Goldfarb and I estimate, very roughly, that about 20% of the research findings in business management are based on little more than random noise.8 Would you trust the word of someone who gave you bad advice one time in five?
Lesson: Don’t hesitate to challenge expert analysis.
Distinguish Stories From Predictions
Most of what we read from scholars, scientists, and other experts are stories that emerge from analysis of patterns in data. Experts ask, “Which companies succeed?” or “Which people make good leaders?” and then weave narratives that describe the patterns: “Companies that ‘stick to their knitting’ succeed” or “Authentic people are better leaders.” These stories are conjectures, Sherlock Holmes-style.
All of us, including experts, fall in love with our guesses and the stories we tell about them. I once asked a world-famous business scientist if he had ever tested his theory by trying to predict future events. He said he didn’t need to, because his theory predicted the past so well. He had forgotten that a story explains and a theory predicts.
To make sure our theories are predictive, we must test them against new information. Big-data analysts have learned this the hard way — by seeing exciting discoveries later debunked as products of chance. So, nowadays, the best analysts split their data into halves, first developing the story or model (the “training set” of data) and then evaluating it (the “validation set”). If they don’t get the same result for both halves, they conclude they don’t have a predictive result.9
Lesson: When an expert links a cause to a supposed effect, ask whether it’s a story to make sense of the past or a theory to forecast the future.
Analyzing empirical evidence always requires assumptions, sometimes so many that the process has been described as a trail with forking paths.10 At each fork, the analysts must make an assumption that may influence the final outcome. One problematic and common type of assumption involves how to assign values to variables that cannot be measured directly.
When analysts cannot conduct randomized experiments and must instead use observational data, guessing is especially common. Such is the case for researchers who study Alzheimer’s disease, because the slow and delayed progression of the disease makes it difficult to implement interventions and study their effects. Instead, analysts must sift through patient histories in search of possible causes, and such sleuthing requires many assumptions about missing information. For example, people who play bridge are less likely to develop Alzheimer’s, but the interpretation of this relationship depends on your guess about the hidden attributes of those who play bridge.11 If players and nonplayers are the same, then bridge playing may indeed prevent Alzheimer’s; if the two groups differ, these hidden factors may be the real explanation.12
Of course, the need to make assumptions clouds the work of management researchers too. Many variables needed for robust analysis are hard to measure, so analysts often make guesses to fill in the gaps.
Nonexperts, as unbiased outsiders, sometimes are better at vetting the logic of such guesses than the experts who made them. Researchers want to believe that they are on the right path — a delusion I myself have suffered — and may convince themselves that some guess is reasonable when it is not. Nobel laureate Richard Feynman articulated this aphorism for empirical scientists: “The first principle is that you must not fool yourself, and you are the easiest person to fool.”13
Lesson: Unearth assumptions that experts have used to get from the raw data to a set of conclusions.
Seek Alternative Explanations
To link effects with their true causes, researchers must rigorously try to rule out rival hypotheses. But most scientists are no more creative than you or I, and they tend to fall in love with a particular explanation and not look hard enough for an alternative. For example, a group of scholars who study gender stereotypes reported that hurricanes kill more people when they have female names, rather than male ones, because the female names made them seem less dangerous.14 The idea evoked such deep gender stereotypes that many people accepted the explanation. Then a host of scholars showed that the data record did not support the claim.15
I know from my own experience that considering alternative explanations is hard work and it is easy to become complacent. During a project in which I was estimating the determinants of demand for certain types of entertainment, I clung to my favorite predictors and stopped considering other causes. The result: a model that failed to accurately predict future demand.
To avoid this pitfall, don’t assume the analyst exhaustively considered rival explanations. Make a list of your own conjectures, and ask whether they were ruled out. This querying is easy if the analyst is a consultant or an employee, but you can also do it with published work. Research authors usually write a few paragraphs on alternative explanations; if they haven’t done so, or if their list seems incomplete, email them. If they have an answer, they’ll write back.
Lesson: Identify alternative explanations for a particular conclusion, and ask why each one is not a better answer.
Know the Limits of Inference
Trying to prove that a suspected cause is the true one is fraught with difficulty, as the philosopher David Hume made clear. As a result, researchers often use logical judo in conducting their analyses. Rather than directly seeking evidence to support the cause, they flip the analysis and measure the probability that the supporting evidence is just a mistake.
Consider the notion of “statistical significance,” which many people think measures confidence in a proposed cause for some observed effect. In fact, it is a measure of the opposite — the probability that chance accounts for what is observed. Thus, an estimate that is statistically significant isn’t necessarily true, and one that is “not significant” isn’t necessarily false. Significance, both found and not found, is just a way to flag, “Hey, there might be a real pattern here.”
Significance also gets confused with importance. With a large enough sample, almost any difference becomes statistically significant, but that does not mean the difference matters. For example, per journeyed mile, the safety record of U.S.-based air travel is statistically significantly better than that of U.S. train travel. Should we then worry about venturing by rail? No: The difference is unimportant because both forms of travel have an extremely low death rate (0.07 deaths per billion miles for air, 0.43 deaths per billion miles for rail). U.S. motorcycle travel, in contrast, is both significantly and substantively more deadly (213 deaths per billion miles).16
Lesson: Always ask, “Does this finding make a material difference in the real world?”
Demand a Robustness Analysis
Even a well-crafted study provides just one estimate of many that are possible. When estimates are not consistent across an array of assumptions, a study’s findings are considered to be less than robust, a term of art in research statistics.
A classic example involves the analysis of the notion that gun possession deters crime. An early study suggested that crime rates fell in areas where laws allowed people to carry a concealed firearm.17 But later studies, using the same data and slightly different assumptions, yielded different conclusions.18 Each side accused the other of being political stooges. The National Research Council tried to adjudicate the debate, but even its members could not agree.19
Finally, a group of scholars showed that slightly different assumptions resulted in widely varying conclusions: Gun possession caused less, more, or the same amount of crime — depending on an array of factors. The researchers even used a fancy statistical method, Bayesian analysis, to evaluate whether a best answer existed given the assumptions in each of the conducted studies. They concluded that, given the available data, we just can’t tell the effect of concealed weapons on crime.20 In short, the finding was not robust.
In contrast, the work of the late economist Steven Klepper on geographic industry clusters has stood up to repeated tests of robustness. Klepper and his colleagues showed, for example, that as companies spin off near their parent companies, they drop like fruit from a founding tree and often grow into strong organizations in their own right.21
Lesson: Confirm that findings persist across a variety of assumptions.
Even if study findings are robust in a particular sample (such as a specific population), they may not apply to other settings or groups. For instance, concealed-carry laws may vary in their effects in the U.S., Canada, or Bermuda. Educational tools that work in one culture may not work in another. Indeed, most studies provide information only about a given sample for a particular population.
If you run a marketing test on, say, U.S. college freshmen, you may have good data on the appeal of a product for that group, but not necessarily for a broader demographic. Draw your conclusions about other groups only after they too have been studied.
Lesson: The population and sample matter.
Be Skeptical of Hearsay
We all want to understand our world, so we tend to see patterns where none exist — canals on Mars, faces on the moon, old men on mountainsides. Mark Twain famously lampooned this tendency in Life on the Mississippi: “There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.” Today’s experts on empirical analysis agree with him: We humans want to believe we know things and to be perceived as knowledgeable, which often leads us to make stronger claims than we should.
For example, I often hear experts say they “know” something on the basis of evidence — when they actually don’t know and, given the limitations of statistical analysis, could not possibly know. They may have good reason to suspect something, but they take the inference too far.
Worse still, other people repeat the inflated claims because they came from an expert. Then with each retelling, the exaggeration spreads. I’ve seen this happen with my own work — to the point where I couldn’t recognize my original thoughts. We all need to be more careful. A simple solution is to banish the word know — and use suspect or suggest instead.
Lesson: Avoid the language of certainty.
When Richard Feynman said, “Science is the belief in the ignorance of experts,”22 he was not disparaging scientists but reminding us that we all can help in advancing knowledge. As we use data to learn about the world and make the best possible business and management decisions, we all engage in scientific inquiry. As we consume others’ learning, we can be useful critics. In whatever role we happen to occupy, we should always question our inferences, think critically about the evidence and the arguments we hear, and admit our own fallibility when we proffer our own conclusions.
At least that is my advice to you, subject to your evaluation and careful critical analysis. If you engage seriously in that effort, you will realize I have told you a story, assumed many things, left alternatives unconsidered, failed to show that my analysis is robust, and made my own unsupported claims.
I hope the few ideas I have shared, based on my experience, will be useful to you. But you must decide that for yourself.