Enterprises Should Learn from Academia’s Reproducibility Crisis
The term “ivory tower” is supposed to capture academia’s aloofness from the real-world needs of business and industry. But thanks to the data science revolution, businesses are incorporating ivory tower-ish thinking into everything they do. A “data-driven” business uses scientific methods from academia to distinguish fact from fiction — or at least that’s what their leaders want to believe.
Before business leaders trust all their data and analytics, however, I have some words of caution. Learn from academia’s struggle to incentivize and champion reproducibility. Otherwise, the data revolution could not only mislead businesses but blind them to their own ignorance.
What Is Reproducability in Research?
Reproducibility is the principle that a scientist shares the step-by-step process that led them to a conclusion. If other scientists get the same or similar results, the academic community considers the conclusion more reliable. The more other scientists can reproduce a result, the more reliable it is. But if the same steps lead to significantly different results, the conclusion becomes suspect.
A Study’s Conclusion Isn’t a Certainty
Reproducibility is the principle that a scientist shares the step-by-step process that led them to a conclusion. That way, other scientists can replicate their study. If those other scientists get the same or similar results, the academic community considers the conclusion more reliable. The more other scientists can reproduce a result, the more reliable it is. But if the same steps lead to significantly different results, the conclusion becomes suspect.
Unfortunately, reproducibility seems to be the exception rather than the norm in the scientific disciplines that have studied it, however. In a 2015 study, 270 psychology researchers teamed up to replicate 98 papers published in peer-reviewed journals, and only 39 percent of the replication attempts were successful. When the science journal Nature surveyed biology researchers about replication attempts, 70 percent said they couldn’t reproduce the findings of their peers, and 60 percent couldn’t replicate their own results.
The reproducibility crisis has even captured Congress’s attention. Times Higher Education reports that its latest budget for the National Institutes of Health (NIH) orders the agency to set aside funds for replication experiments. A Senate report on this budget notes “that many biomedical research studies have turned out to be irreproducible or even outright fraudulent.”
Incentives are central to the problem. New discoveries can win tenure and funding packages for academic researchers, but replication experiments generally do not. Plus, almost no one wants to spend their career checking whether the work of their peers is reproducible. Doing so makes you few friends and a lot of enemies.
That all said, the incentives and culture in business are even more hostile to reproducibility.
Unverified Results Mean Bad Decisions Quicker
In “data-driven” businesses, much like academia, data scientists and analysts are encouraged to produce new models and discover new things quickly — not to review and replicate old analyses. Nor are they incentivized to challenge each other’s work, in part, because speed is so overvalued.
Think of all the B2B tech companies that idolize speed, acceleration, efficiency and time-savings in their marketing copy. American culture prizes certainty and decisive action while deriding leaders who waffle on decisions. We seem to believe any advantage, especially those found through data science, will vanish if we don’t act quickly enough.
Thus, in a typical corporation, second-guessing data and the decisions it drives is a great way to alienate yourself. No one gets promoted for arguing that their team should hold off on a decision until more information is available. Often, companies measure performance by the volume of decisions made and things done, not their impact or quality.
Without reproducibility in their data science operations, though, companies are liable to operate on false and misleading data. I’ve witnessed firsthand as an executive realized that the unverified dashboards they’ve relied on daily for years were wrong. No one described the steps they took to produce the dashboard, and no one took the time to review and replicate those steps.
The dashboards may have been developed and deployed quickly. Ultimately, however, without a reproducible verification process, they slowed down the company.
How to Make Reproducibility Respected
Confidence in false or misleading data is arguably worse than lacking data, which means you probably realize what you don’t know. The bigger risk here is that when irreproducible data starts to lead to irreversible mistakes, leaders will stop trusting data. That leaves an opening for assumptions, biases and instincts to take over again.
3 Solutions to the Reproducability Crisis
- Document analyses like science experiments.
- Incentivize verification and replication.
- Workshop analyses live.
As I’ve argued, attention to reproducibility is about processes, incentives and culture. We know data science should be treated with skepticism until it is reproduced or verified, but that won’t happen because one leader says so. With that in mind, a few practices can help bring reproducibility into enterprise data science.
Enterprises must treat each step in a new data analysis like an experiment. Document every step — how the data was sourced, how it was prepared and cleaned, and the operations taken to create insights and visuals. Some analytics tools can do this automatically, but most will require painstaking, manual documentation that may include SQL and Python code or just plain-English directions, depending on which tool the data scientist used.
The final documentation, stored where data teams can access it, should enable a data scientist to reproduce the analysis and compare the results to earlier attempts. Reproducing the analysis is the only way to verify it.
For your data scientists and analysts, create a bonus structure that rewards verifying and replicating analyses. Again, almost no one wants to be the naysayer who proves why an analysis that took weeks or months is wrong. If performance and compensation are tied to replication, however, the culture will grow to accept and value this behavior. You have to create an upside for people who do this important but uncelebrated work.
Once a data science team publishes an analysis in a PDF with a slick design and authoritative tone, few people will doubt it. Graphs and charts camouflage weaknesses in data from critical minds. But when a team of data scientists, analysts, and domain experts gather to iterate on an analysis, critical thinking happens. The group can discuss, modify, and rerun the analysis live, over and over, leading to more useful and trustworthy insights.
Trust, but Verify
In every domain of business, we document processes so that others might complete them consistently and in alignment with some standard. The difference with data is that we’re not dealing with a known procedure that has been completed successfully and just needs to be broken down into steps. We’re dealing with an experiment to find new facts, create forecasts and build predictive machine learning models. The steps can lead to convincing, logical results that are wrong, though. So, assuming that the data expert knows what they’re doing and shouldn’t bother with documenting their workflow is a mistake.
The ivory tower may be aloof, but its institutions have honed powerful methods for separating fact and fiction. If only academic researchers used those methods more often.
Don’t make academia’s mistake. Start taking reproducibility seriously.