Fortify AI Training Datasets From Malicious Poisoning

April 24, 2024

66 3 minutes read

Fortify AI Training Datasets From Malicious Poisoning — bigdata NicoElNino shutterstock.jpg

COMMENTARY

Picture this: It’s a Saturday morning, and you made breakfast for your family. The pancakes were golden brown and seemingly tasted OK, but everyone, including you, got sick shortly after eating them. Unbeknown to you, the milk that you used to make the batter expired several weeks ago. The quality of the ingredients impacted the meal, but everything looked fine on the outside.

The same philosophy can be applied to artificial intelligence (AI). Regardless of its purpose, AI’s output is directly related to the quality of its input. As the popularity of AI continues to rise, security concerns around the data being fed into AI are coming into question.

A majority of today’s organizations are integrating AI into business operations at some capacity — and threat actors are taking note. Over the past few years, a tactic known as AI poisoning has become increasingly prevalent. This new malicious practice involves injecting deceptive or harmful data into AI training sets. The tricky part about AI poisoning is that, despite the input being compromised, the output can initially continue as normal. It isn’t until a threat actor gets a firm grip on the data and begins a full-fledged attack that deviations from the norm become obvious. The consequences range from slightly inconvenient to damaging a brand’s reputation.

It’s a risk affecting organizations of all sizes, even today’s most prominent tech vendors. For example, over the past few years, adversaries launched several large-scale attacks to poison Google’s Gmail spam filters and even turned Microsoft’s Twitter chatbot hostile.

Defending Against AI Data Poisoning

Fortunately, organizations can take the following steps to shield AI technologies from potential poisoning.

Build a comprehensive data catalog. First, organizations should create a live data catalog that serves as a centralized repository of information that is being fed to its AI systems. Any time new data is added to AI systems, it should be tracked in this index. In addition, the catalog should be able to categorize the data flowing into AI systems by the who, what, when, where, why, and how to ensure transparency and accountability.
Develop a normal baseline for users and devices interacting with AI data. Once the security and IT teams have a solid understanding of all of the data in AI systems and who has access to it, it’s important to develop a baseline of normal user and device behavior.

Compromised credentials are one of the easiest ways for cybercriminals to break into networks. All a threat actor has to do is either play a guessing game or buy one of the 24 billion username and password combinations available on the cybercriminal marketplace. Once they have access, a threat actor can easily maneuver their way into accessing AI training datasets.

By establishing user and device baseline behavior, security teams can easily detect abnormalities that might be indicative of an attack. Often, this helps stop a threat actor before an incident escalates into a full-blown data breach. For example, say you have an IT executive who typically works from the New York office and who oversees the AI data training sets. One day, it shows that he is active in another country and is adding large amounts of data to the AI. If your security team already has a baseline of user behavior, they can quickly tell that this is abnormal. Then security could either talk to the executive and verify that he was performing the action or, if he wasn’t, temporarily disable his account until the alert is thoroughly investigated to prevent any further damage.

Taking Responsibility of AI Training Sets

Just like you should check the quality of the ingredients before you make a meal, it’s critical to ensure the integrity of AI training data. AI intelligence is intricately linked to the quality of data it processes. Implementing guidelines, policies, monitoring systems, and improved algorithms plays a pivotal role in ensuring the safety and effectiveness of AI. These measures safeguard against potential threats and empower organizations to harness the transformative potential of AI. It is a delicate balance where organizations must learn to leverage AI’s capabilities, while remaining vigilant in the face of the ever-evolving threat landscape.

Source

April 24, 2024

66 3 minutes read