AI guardrails can be easily beaten, even if you don’t mean to
Guardrails designed to prevent AI chatbots from generating illegal, explicit or otherwise wrong responses can be easily bypassed, according to research from the UK’s AI Safety Institute (AISI).
The AISI found that five undisclosed large language models were “highly vulnerable” to jailbreaks –inputs and prompts crafted to elicit responses that are not intended by their makers.
In a recent report, AISI researchers revealed that the models could be circumvented with minimal effort, highlighting the ongoing safety and security concerns associated with generative AI.
AI chatbots can be jailbroken too easily
The report, which arrived in anticipation of the upcoming AI Safety Summit in Seoul, jointly hosted by South Korea and the UK, noted:
“All tested models remain highly vulnerable to basic “jailbreaks”, and some will produce harmful outputs even without dedicated attempts to circumvent safeguards.”
Despite claims from all of the leading AI developers, such as OpenAI, Meta and Google, about their in-house safety measures, AISI’s findings suggest that significant gaps that could lead to potentially lead to major safety concerns remain.
Although the UK government has withheld the names of the five models it tested, it confirmed that they are publicly available.
The interim report, which precedes a full report expected to be published later this year with research from more than 30 countries, arrived just days before the Seoul-based AI Safety Summit, which is seen as the successor to Britain’s Bletchley Park summit late last year.
At the upcoming Seoul Summit, jointly hosted by South Korean President Yoon Suk Yeol and British Prime Minister Rishi Sunak, global leaders and industry experts are expected to come together to discuss AI safety within the realms of innovation and inclusivity.