Haize Labs wants to automate AI safety

June 12, 2024

162 2 minutes read

Haize Labs wants to automate AI safety — 6YMPKX2UIJ4HDBA5HC34OD4DPQ.jpgw1440.jpeg

An artificial intelligence start-up says it has found thousands of vulnerabilities in popular generative AI programs and released a list of its discoveries.

After testing popular generative AI programs including video creator Pika, text-focused ChatGPT, image generator Dall-E and an AI system that generates computer code, Haize Labs discovered that many of the well-known tools produced violent or sexualized content, instructed users on the production of chemical and biological weapons and allowed for the automation of cyberattacks.

Haize is a small, five-month-old start-up founded by Leonard Tang, Steve Li and Richard Liu, three recent graduates who all met in college. Collectively, they published 15 papers on machine learning while they were in school.

Tang described Haize as an “independent third-party stress tester” and said his company’s goal is to help root out AI problems and vulnerabilities at scale. Pointing to one of the largest bond-rating firms as a comparison, Tang said Haize hopes to become a “Moody’s for AI” that establishes public-safety ratings for popular models.

AI safety is a growing concern as more companies integrate generative AI into their offerings and use large language models in consumer products. Last month, Google faced sharp criticism after its experimental “AI Overviews” tool, which purports to answer users’ questions, suggested dangerous activities such as eating one small rock per day or adding glue to pizza. In February, Air Canada came under fire when its AI-enabled chatbot promised a fake discount to a traveler.

Industry observers have called for better ways to evaluate the risks of AI tools.

“As AI systems get deployed broadly, we are going to need a greater set of organizations to test out their capabilities and potential misuses or safety issues,” Jack Clark, co-founder of AI research and safety company Anthropic, recently posted to X.

“What we’ve learned is that despite all the safety efforts that these big companies and industry labs have put in, it’s still super easy to coax these models into doing things they’re not supposed to; they’re not that safe,” Tang said.

Haize’s testing automates “red teaming,” the practice of simulating adversarial actions to identify vulnerabilities in an AI system. “Think of us as automating and crystallizing the fuzziness around making sure models adhere to safety standards and AI compliance,” Tang said.

The AI industry needs an independent safety entity, said Graham Neubig, associate professor of computer science at Carnegie Mellon University.

GET CAUGHT UP

Summarized stories to quickly stay informed

“Third-party AI safety tools are important,” Neubig said. “They’re both fair and impartial because they aren’t built by the companies building the models themselves. Also, a third-party safety tool can have higher performance with respect to auditing because it’s built by an organization that specializes in that, as opposed to each company building their tools ad hoc.”

Haize is open-sourcing the attacks uncovered in its review on the GitHub developers platform to raise awareness about the need for AI safety. Haize said it proactively flagged the vulnerabilities to the makers of the AI tools tested, and the start-up has partnered with Anthropic to stress test an unreleased algorithmic product.

Tang said rooting out vulnerabilities in AI platforms through automated systems is crucial because manually discovering problems takes a long time and exposes those who work in content moderation to violent and disturbing content. Some of the content discovered through Haize Labs’ review of popular generative AI tools included gruesome and graphic imagery and text.

“There’s been too much discourse about AI-taking-over-the-world type of safety problems,” Tang said. “I think they’re important, but the much larger problem is the short-term misuse of AI.”

Source

June 12, 2024

162 2 minutes read