Open Toolkit to Address Generative AI Risk| ARC Advisory Group

June 4, 2024

45 1 minute read

A testing toolkit for addressing security and safety challenges associated with the use of large language models (LLMs), AI Verify Project Moonshot brings red-teaming, benchmarking, and baseline testing together in an easy-to-use platform. The initiative, which was announced during the Asia Tech x Singapore event on 31 May, is an example of the city-state’s commitment to harnessing the power of the global open-source community in addressing AI risks.

An open beta, Project Moonshot aims to provide intuitive results of the quality and safety of a model or application in an easily understood manner, even for a non-technical user. It was developed through working with partners including DataRobot, IBM, Singtel, and Temasek to ensure that the tool is useful and aligned with industry needs.

“IBM is pleased to be a design partner and contributor to AI Verify Moonshot. The provision of this new tool is significant as it aims to help developers and data scientists test their LLM applications against a baseline of risks, thereby accelerating the adoption of AI,” said Anup Kumar, CTO Data & AI, Head Client Engineering Asia Pacific, IBM.

According to DataRobot’s Chief Customer Officer, Jay Schuren, the integration of Project Moonshot into the DataRobot AI Platform will allow AI builders to confidently and responsibly scale generative AI within their organizations

Project Moonshot is also part of an important move towards global testing standards. Two of the leading AI testing organisations – AI Verify Foundation and MLCommons – have come together to build a common safety benchmark suite. To further their collaboration, they signed a memorandum of intent (MOI) on 29 May, intending to positively impact AI safety by providing model and application developers a globally accepted approach of safety testing for generative AI.

Source

June 4, 2024

45 1 minute read