Some A.I. Companies Face a New Accusation: ‘Open Washing’
SHOP TALK
/ō-pən-wä-shĭng/
An accusation against some A.I. companies that they are using the “open source” label too loosely.
Shop Talk explores the idioms of the business world: the insider jargon, the newly coined terms, the unfortunate or overused phrases.
There’s a big debate in the tech world over whether artificial intelligence models should be “open source.” Elon Musk, who helped found OpenAI in 2015, sued the startup and its chief executive, Sam Altman, on claims that the company had diverged from its mission of openness. The Biden administration is investigating the risks and benefits of open source models.
Proponents of open source A.I. models say they’re more equitable and safer for society, while detractors say they are more likely to be abused for malicious intent. One big hiccup in the debate? There’s no agreed-upon definition of what open source A.I. actually means. And some are accusing A.I. companies of “openwashing” — using the “open source” term disingenuously to make themselves look good. (Accusations of openwashing have previously been aimed at coding projects that used the open source label too loosely.)
In a blog post on Open Future, a European think tank supporting open sourcing, Alek Tarkowski wrote, “As the rules get written, one challenge is building sufficient guardrails against corporations’ attempts at ‘openwashing.’” Last month the Linux Foundation, a nonprofit that supports open-source software projects, cautioned that “this ‘openwashing’ trend threatens to undermine the very premise of openness — the free sharing of knowledge to enable inspection, replication and collective advancement.”
Organizations that apply the label to their models may be taking very different approaches to openness. For example, OpenAI, the startup that launched the ChatGPT chatbot in 2022, discloses little about its models (despite the company’s name). Meta labels its LLaMA 2 and LLaMA 3 models as open source but puts restrictions on their use. The most open models, run mainly by nonprofits, disclose the source code and underlying training data, and use an open source license that allows for wide reuse. But even with these models, there are obstacles to others being able to replicate them.
The main reason is that while open source software allows anyone to replicate or modify it, building an A.I. model requires much more than code. Only a handful of companies can fund the computing power and data curation required. That’s why some experts say labeling any A.I. as “open source” is at best misleading and at worst a marketing tool.
“Even maximally open A.I. systems do not allow open access to the resources necessary to ‘democratize’ access to A.I., or enable full scrutiny,” said David Gray Widder, a postdoctoral fellow at Cornell Tech who has studied use of the “open source” label by A.I. companies.
Efforts to create a clearer definition for open source A.I. are underway. Researchers at the Linux Foundation in March published a framework that places open source A.I. models into various categories. And the Open Source Initiative, another nonprofit, is trying to draft a definition.
But Mr. Widder and others doubt that truly open source A.I. is possible. The prohibitive resource requirements for building A.I. models, he said, “are simply not going away.”