Artificial Intelligence

AI Weights: Securing the Heart and Soft Underbelly of Artificial Intelligence

AI model weights govern outputs from the system, but altered or ‘poisoned’, they can make the output erroneous and, in extremis, useless and dangerous.

Kevin Townsend

June 20, 2024

AI model weights are simultaneously the heart and soft underbelly of AI systems. They govern the outputs from the system. But altered or ‘poisoned’, they can make the output erroneous and, in extremis, useless and even dangerous.

It is essential that these weights are protected from bad actors. As we move towards greater AI integration in business, model weights become the new corporate crown jewels; but this is a new and not fully understood threat surface that must be secured. This becomes more essential with future generations of complex gen-AI systems – so important that RAND, a non-profit research organization, has developed a security framework (PDF) for what is termed Frontier AI.

What are AI weights? SecurityWeek discussed the concept of AI weights, the new attack surface created by them, and the required approaches to security, with two of the report’s authors: Sella Nevo (director of the Meselson Center at the RAND Corporation), and Dan Lahav (co-founder and CEO at Pattern Labs). It is worth noting that the RAND report only applies to private rather than open AI model use.

Effectively, weights are the AI system’s learning on how to respond to an input. “Model weights,” explain the authors, “is a casual term for all the trainable parameters of an AI model. First the model consumes vast amounts of data, and then it is trained on how to respond, how to make decisions, how to do inferences. Everything it learns in this process is called model weights; but it’s really just a very, very long series of numbers, often tens of millions or billions, or even trillions of numbers. The weights represent everything that the model knows and has learned from the data.”

Learn More at SecurityWeek’s AI Risk Summit | June 25-26, Ritz-Carlton, Half Moon Bay

This immediately gives us two new threat vectors to protect: the training data (garbage in, garbage out) and the weight values (manipulating the values could introduce intended or unintended bias in the system’s responses). The Eliza Effect, fittingly emanating from a very early chatbot developed in the 1960s, demonstrates a human predisposition to believe and accept computer output – regardless of its actual accuracy. The combination of weight compromise and the Eliza Effect means that humans as well as dependent system processes will erroneously just do as they are told by the AI system, without question.

The weights are surprisingly valuable to bad actors. “If you’re a malicious actor able to steal AI weights, you remove a huge barrier from being able to recreate the model and use it in any way you wish,” explained the authors. If that allows bad actors to damage critical infrastructure or national security, it becomes a huge problem area.

In some cases, the intention may be to manipulate the weights without stealing them. In a manufacturing environment, manipulating the weights could disrupt the manufacturing process. But weight theft is equally problematic, given the ability of advanced actors to enter, exfiltrate, and depart without altering the original file. Weights could be stolen and misused without anyone being aware.

Advertisement. Scroll to continue reading.

If it’s a financial fraud AI model, a bad actor could use stolen weights to understand how fraud is being detected and gain insights into how to bypass the detection. The same principle would apply to AI-based spam and phishing detection systems.

The purpose of RAND’s AI security framework is to ensure that companies using AI understand how to protect their own AI systems, and especially the weights, against bad actors with different levels of sophistication – from criminal gangs to nation state elites. “We’re trying to show how these models can be protected so they do not fall into the hands of malicious actors.”

There are two fundamental stages in preparing an AI system: training and inference. Training involves interaction with the training data and learning (the weights) how to respond. The inference stage is when the model is ready to apply its learning (the weights) to new unseen data. By this time, the weights are part of the AI model, while the training data will be separate. If the training data is proprietary to the organization using the model (which will increase in the future), we have an AI system that is focused on the user company or organization or agency, and one that becomes especially valuable to bad actors. The weights are the encoded knowledge of the AI model.

AI usage results in a new combination of threat surface and attack possibilities. While the need for and practice of cybersecurity itself is not new, RAND believes that protecting AI models and their weights demands new specifics. For example, the use of Confidential Computing (encryption of data at rest and data in transit supported by processing decrypted data only within a heavily protected enclave) should be standard. To this it adds more common concepts, heavily oriented toward limited and controlled access to the weights. This limited and controlled access should be supported by additional hardening to prevent any possible exfiltration of the weights.

The study identifies almost forty distinct attack vectors with examples of real world successful attacks. It explores the likely attacker capabilities, from financially driven criminals to elite and highly resourced nation states; and relates the attack vectors to the different attacker capabilities. It defines and proposes five security levels together with benchmarks that correlate to them.

It notes that the complexity of future AI models required to interact with the internet are not likely securable with current methodologies: “Protecting models that are interacting with the internet against the most capable threat actors is currently not feasible.”

More worryingly it suggests that deployment of the necessary security measures could take five years to implement. Examples include the development of hardware able to secure encrypted weights while allowing decryption for processing (the inference process) without letting the decrypted weights ever leave the secure hardware – that is, a special application of the Confidential Computing concept; and establishing separate secure and completely isolated networks for training, research, and other interactions with weights.

The overriding conclusion of the report is that AI technology and its use is advancing, and its importance to commercial interests and national security is increasing, faster than our current efforts to protect it. If we want to get ahead of the future threat, we need to start planning and implementing now.

Learn More at SecurityWeek’s AI Risk Summit | June 25-26, Ritz-Carlton, Half Moon Bay

Written By Kevin Townsend

Kevin Townsend is a Senior Contributor at SecurityWeek. He has been writing about high tech issues since before the birth of Microsoft. For the last 15 years he has specialized in information security; and has had many thousands of articles published in dozens of different magazines – from The Times and the Financial Times to current and long-gone computer magazines.

SECURITYWEEK NETWORK:

ICS:

SecurityWeek

Artificial Intelligence

AI Weights: Securing the Heart and Soft Underbelly of Artificial Intelligence

More from Kevin Townsend

Latest News

Trending

CIEM Chat: How to Reduce Cloud Identity Risk

Event: AI Risk Summit | Ritz-Carlton, Half Moon Bay, CA

People on the Move

Expert Insights

The Perilous Role of the CISO: Navigating Modern Minefields

Know Your Adversary: Why Tuning Intelligence-Gathering to Your Sector Pays Dividends

When Vendors Overstep – Identifying the AI You Don’t Need

Upleveling the State of SMB Cybersecurity

8 Degrees of Secure Access Service Edge