Bug in EmbedAI can allow poisoned data to sneak into your LLMs
EmbedAI, an application used to interact with documents by utilizing the capabilities of large language models (LLMs), is experiencing a data poisoning vulnerability, according to cybersecurity research firm, Synopsys.
“This vulnerability could result in an application becoming compromised, leading to unauthorized entries or data poisoning attacks,” Synopsys said in a security blog. “Exploitation of this vulnerability could affect the immediate functioning of the model and can have long-lasting effects on its credibility and the security of the systems that rely on it.”
The vulnerability, which has a CVSS score of 7.5/10, affects the EmbedAI “main” branch and hasn’t yet been assigned a CVE ID.
Cross-site request forgery
According to Synopsys, EmbedAI is experiencing a cross-site request forgery (CSRF) vulnerability, a web security vulnerability that allows threat actors to trick end users into executing unwanted actions on a web application in which they’re currently authenticated.
“These attacks are enabled by a cross-site request forgery (CSRF) vulnerability created by the absence of a secure session management implementation and weak cross-origin resource-sharing policies,” Synopsys added.
In the context of LLMs, the vulnerability enables malicious attempts to trick victim users into uploading poisoned data into their language model. This can open applications using the EmbedAI component to potential data leakage.
Additionally, data poisoning can harm the user’s applications in many other ways, including spreading misinformation, introducing biases, degradation of performance, and potential for denial-of-service attacks.
Isolating applications may help
Synopsys has emphasized that the only available remediation to this issue is isolating the potentially affected applications from integrated networks. Synopsys Cybersecurity Research Center (CyRC) said in the blog that it “recommends removing the applications from networks immediately.”
“The CyRC reached out to the developers but has not received a response within the 90-day timeline dictated by our responsible disclosure policy,” the blog added.
The vulnerability was discovered by Mohammed Alshehri, a security researcher at Synopsys. “There’re products where they take an existing AI implementation and merge them together to create something new,” Alshehri told DarkReeading in an interview. “What we want to highlight here is that even after the integration, companies should test to ensure that the same controls we have for Web applications are also implemented on the APIs for their AI applications.”
The research highlights that the rapid integration of AI into business operations carries risks, particularly for companies that allow LLMs and other generative AI (GenAI) applications to access extensive data repositories. Despite it being a nascent area, security vendors such as Dig Security, Securiti, Protect AI, eSentire, etc are already scrambling to put up a defense against evolving GenAI threats.