How AI-driven patching could transform cybersecurity
Unpatched software vulnerabilities have long been a chronic cybersecurity pain point, leading to costly data breaches every year. On average, a data breach resulting from the exploitation of a known vulnerability costs $4.17 million, according to IBM’s “Cost of a Data Breach Report 2023.”
The problem: Organizations don’t patch software flaws as quickly as threat actors find and exploit them. Once a critical vulnerability is published, malicious scanning activity begins in a median time of five days, according to Verizon’s “2024 Data Breach Investigations Report.” On the other hand, two months after fixes for critical vulnerabilities become available, nearly half of them remain unremediated.
A potential solution: Generative AI. Some cybersecurity experts believe GenAI can help close that gap by not just finding bugs, but also fixing them. In internal experiments, Google’s large language model (LLM) has already achieved modest but significant success, remediating 15% of simple software bugs it targeted.
In a presentation at RSA Conference (RSAC) 2024, Elie Bursztein, cybersecurity technical and research lead at Google DeepMind, said his team is actively testing various AI security use cases, ranging from phishing prevention to incident response. But the ability to use Google’s LLM to secure its codebase by finding and patching vulnerabilities — and, ultimately, reducing or eliminating the number of vulnerabilities that require patching — tops their AI security wish list.
Google’s AI-driven patching experiment
In a recent experiment, Bursztein’s team compiled 1,000 simple vulnerabilities from within the Google codebase, discovered by sanitizers in C/C++.
They then asked a Gemini-based AI model — similar to Google’s publicly available Gemini Pro — to generate and test patches and identify the best ones for human review. In a technical report, researchers Jan Nowakowski and Jan Keller said the experiment’s prompts followed this general structure:
You are a Senior Software Engineer tasked with fixing sanitizer errors. Please fix them.
… code
// Please fix the <error_type> error originating here.
… LOC pointed to by the stack trace
… code
Engineers reviewed the AI-generated patches — an effort Bursztein described as significant and time-consuming — ultimately approving 15% and adding them to Google’s codebase.
“Instead of a software engineer spending an average of two hours to create each of these commits, the necessary patches are now automatically created in seconds,” Nowakowski and Keller wrote.
And, given the thousands of bugs discovered each year, they noted, automatically finding fixes for even a small percentage could add up to months of engineering time and effort saved.
Elie BurszteinCybersecurity technical and research lead, Google DeepMind
AI-driven patching wins
In his RSAC presentation, Bursztein said the results of the AI patching experiment suggest Google researchers are on the right track. “The model shows an understanding of code and coding principles that is quite impressive,” he said.
In one instance, for example, the LLM correctly identified and fixed a race condition by adding a mutex.
“Understanding the concept that you have a race condition is not trivial,” Bursztein said, adding that the model was also able to fix some data leaks by removing pointer use. “So, in a way, it is almost doing the writing.”
AI-driven patching challenges
Although the results of the AI patching experiment were promising, Bursztein cautioned that the technology is far from where Google hopes to one day see it — reliably and autonomously fixing 90%-95% of bugs. “We have a very long way to go,” he said.
The experiment underscored the following significant challenges:
- Complexity. The AI seemed better at fixing some types of bugs than others — often those with fewer lines, researchers found.
- Validation. The validation process for AI-suggested fixes — in which human operators make sure patches address the vulnerabilities in question without breaking anything in production — remains complex and requires manual intervention.
- Data set creation and model training. In one instance of problematic behavior, according to Bursztein, the AI commented out to get rid of a bug — but also got rid of the code in the process. “Problem solved!” Bursztein said. “Besides being funny, this shows you how hard it’s going to be.”
To train the AI out of this behavior requires data sets with thousands of benchmarks, he added, each assessing both whether a vulnerability is fixed and whether program features are kept intact. Creating these, Bursztein predicted, will be a challenge for the cybersecurity community at large.
These difficulties notwithstanding, he remains optimistic that AI might one day autonomously drive bug discovery and patch management, shrinking vulnerability windows until they all but disappear.
“How we get there is going to be interesting,” Bursztein said. “But the upsides are massive, so I hope we do get there.”
Alissa Irei is senior site editor of TechTarget Security.