The Replication Crisis in Artifical Intelligence

The Hidden Weakness Undermining Trust in AI Research

In partnership with

Turn AI Into Your Income Stream

The AI economy is booming, and smart entrepreneurs are already profiting. Subscribe to Mindstream and get instant access to 200+ proven strategies to monetize AI tools like ChatGPT, Midjourney, and more. From content creation to automation services, discover actionable ways to build your AI-powered income. No coding required, just practical strategies that work.

Hello everyone and welcome to my newsletter where I discuss real-world skills needed for the top data jobs and specifically the AI Agent Role. 👏

This week we discuss why reproducibility is becoming an AI Bottleneck.

Not a subscriber? Join the informed. Over 200K people read my content monthly.

Thank you. 🎉

Last month Nature published a damning response written by 31 scientists to a study from Google Health that had appeared in the journal earlier this year. Google was describing successful trials of an AI that looked for signs of breast cancer in medical images. But according to its critics, the Google team provided so little information about its code and how it was tested that the study amounted to nothing more than a promotion of proprietary tech.

âťť

…the study amounted to nothing more than a promotion of proprietary tech.

We couldn’t take it anymore, says Benjamin Haibe-Kains, the lead author of the response, who studies computational genomics at the University of Toronto. It's not about this study in particular—it’s a trend we've been witnessing for multiple years now that has started to really bother us.

âťť

It’s a trend we've been witnessing for multiple years now that has started to really bother us.

Haibe-Kains and his colleagues are among a growing number of scientists pushing back against the lack of transparency in AI research. When we saw that paper from Google, we realized that it was yet another example of a very high-profile journal publishing a very exciting study that has nothing to do with science, he says. It's more an advertisement for cool technology. We can’t really do anything with it.

âťť

When we saw that paper from Google, a very exciting study that has nothing to do with science, he says. It's more an advertisement for cool technology. We can’t really do anything with it.

Science is built on a bedrock of trust, which typically involves sharing enough details about how research is carried out to enable others to replicate it, verifying results for themselves. This is how science self-corrects and weeds out results that don’t stand up. Replication also allows others to build on those results, helping to advance the field. Science that can’t be replicated falls by the wayside.

âťť

Science that can’t be replicated falls by the wayside.

At least, that’s the idea. In practice, few studies are fully replicated because most researchers are more interested in producing new results than reproducing old ones. But in fields like biology and physics—and computer science overall—researchers are typically expected to provide the information needed to rerun experiments, even if those reruns are rare.

AI is feeling the heat for several reasons. For a start, it is a newcomer. It has only really become an experimental science in the past decade, says Joelle Pineau, a computer scientist at Facebook AI Research and McGill University, who coauthored the complaint. It used to be theoretical, but more and more we are running experiments, she says. And our dedication to sound methodology is lagging behind the ambition of our experiments.

The problem is not simply academic. A lack of transparency prevents new AI models and techniques from being properly assessed for robustness, bias, and safety. AI moves quickly from research labs to real-world applications, with direct impact on people’s lives. But machine-learning models that work well in the lab can fail in the wild—with potentially dangerous consequences. Replication by different researchers in different settings would expose problems sooner, making AI stronger for everyone.

âťť

…machine-learning models that work well in the lab can fail in the wild.

AI already suffers from the black-box problem: it can be impossible to say exactly how or why a machine-learning model produces the results it does. A lack of transparency in research makes things worse. Large models need as many eyes on them as possible, more people testing them and figuring out what makes them tick. This is how we make AI in health care safer, AI in policing more fair, and chatbots less hateful.

âťť

Large models need as many eyes on them as possible.

Then there’s the growing gap between the haves and have-nots when it comes to the two pillars of AI, data and hardware. Data is often proprietary, such as the information Facebook collects on its users, or sensitive, as in the case of personal medical records. Tech giants carry out more and more research on enormous, expensive clusters of computers that few universities or smaller companies have the resources to access.

âťť

Tech giants carry out more and more research on enormous, expensive clusters of computers that few universities or smaller companies have the resources to access.

To take one example, training the language generator GPT-3 is estimated to have cost OpenAI $10 to $12 million—and that’s just the final model, not including the cost of developing and training its prototypes. You could probably multiply that figure by at least one or two orders of magnitude, says Benaich, who is founder of Air Street Capital, a VC firm that invests in AI startups. Only a tiny handful of big tech firms can afford to do that kind of work, he says: Nobody else can just throw vast budgets at these experiments.

The rate of progress is dizzying, with thousands of papers published every year. But unless researchers know which ones to trust, it is hard for the field to move forward. Replication lets other researchers check that results have not been cherry-picked and that new AI techniques really do work as described. It's getting harder and harder to tell which are reliable results and which are not, says Pineau.

Pineau believes there’s something to that. She thinks AI companies are demonstrating a third way to do research, somewhere between Haibe-Kains’s two streams. She contrasts the intellectual output of private AI labs with that of pharmaceutical companies, for example, which invest billions in drugs and keep much of the work behind closed doors.

The long-term impact of the practices introduced by Pineau and others remains to be seen. Will habits be changed for good? What difference will it make to AI’s uptake outside research? A lot hangs on the direction AI takes. The trend for ever larger models and data sets—favored by OpenAI, for example—will continue to make the cutting edge of AI inaccessible to most researchers. On the other hand, new techniques, such as model compression and few-shot learning, could reverse this trend and allow more researchers to work with smaller, more efficient AI.

Either way, AI research will still be dominated by large companies. If it’s done right, that doesn’t have to be a bad thing, says Pineau: AI is changing the conversation about how industry research labs operate. The key will be making sure the wider field gets the chance to participate. Because the trustworthiness of AI, on which so much depends, begins at the cutting edge.

Thanks for watching and have a great day. đź‘Ź