10 things to watch out for with open source gen AI
Instead, he says, people are more likely to look at the APIs and interfaces of major vendors, such as OpenAI, as nascent de-facto standards. “That’s what I’m seeing folks do,” he says.
8. Lack of transparency
You might think open source models are, by definition, more transparent. But that might not always be the case. Large commercial projects may have more resources to spend on creating documentation, says Eric Sydell, CEO at BI software vendor Vero AI, which recently released a report scoring major gen AI models based on areas such as visibility, integrity, legislative preparedness, and transparency. Google’s Gemini and OpenAI’s GPT-4 ranked the highest.
“Just because they’re open source doesn’t necessarily mean they provide the same information about the background of the model and how it was developed,” says Sydell. “The bigger commercial models have done a better job around that at this point.”
Take bias, for example.
“We found that the top two closed models in our ranking had quite a bit of documentation and invested time exploring the issue,” he says.
9. Lineage issues
It’s common for open source projects to be forked, but when this happens with gen AI, you get risks you don’t get with traditional software. Say, for example, a foundation model uses a problematic training data set and from it, someone creates a new model, so it’ll inherit these problems, says Tyler Warden, SVP of product at Sonatype, a cybersecurity vendor.
“There’s a lot of black box aspects to it with the weights and turning,” he says.
In fact, those problems may go several levels back and won’t be visible in the code of the final model. When a company downloads a model for its own use, the model gets even further removed from the original sources. The original base model might have fixed the issues, but, depending on the amount of transparency and communication up and down the chain, the developers working on the last model might not even be aware of the fixes.
10. The new shadow IT
Companies that use open source components as part of their software development process have processes in place to vet libraries and ensure components are up to date. They make sure projects are well supported, security issues are dealt with, and the software has appropriate license terms.
With gen AI, however, the people supposed to do the vetting might not know what to look for. On top of that, gen AI projects sometimes fall outside the standard software development processes. They might come out of data science teams, or skunkworks. Developers might download the models to play with and end up getting more widely used. Or business users themselves might follow online tutorials and set up their own gen AI, bypassing IT altogether.
The latest evolution of gen AI, autonomous agents, have the potential to put enormous power in the hands of these systems, raising the risk potential of this type of shadow IT to new heights.
“If you’re going to experiment with it, create a container to do it in a way that’s safe for your organization,” says Kelley Misata, senior director of open source at Corelight. This should fall under the responsibility of a company’s risk management team, she says, and the person who makes sure that developers, and the business as a whole, understands there’s a process is the CIO.
“They’re the ones best positioned to set the culture,” she says. “Let’s tap into the innovation and all the greatness that open source offers, but go into it with eyes open.”
The best of both worlds?
Some companies are looking for the low cost, transparency, privacy, and control of open source, but would like to have a vendor around to provide governance, long term sustainability, and support. In the traditional open source world, there are many vendors who do that, like Red Hat, MariaDB, Docker, Automattic, and others.
“They provide a level of safety and security for large enterprises,” says Priya Iragavarapu, VP of data science and analytics at AArete. “It’s almost a way to mitigate risk.”
There aren’t too many of these vendors in the gen AI space, but things are starting to change, she says.