Attacking Your AI: New Cyber Security Obligations Under the EU's AI Act

With the EU's AI Act expected to be published in the Official Journal imminently (20 days after which, the Act will come into force), those designing, developing, and/or deploying AI will need to start getting to grips with the myriad of new obligations - even if some may not apply for another 3 years!

We've previously set out a comprehensive overview of the AI Act but whilst some obligations will look familiar (e.g. broad transparency obligations) there are others which in addition to being somewhat novel are far more explicit. One such group of obligation includes the requirements of Article 15 relating to accuracy, robustness, and cybersecurity. Among other cyber security requirements, Article 15(5) also introduces a requirement for High Risk systems to be “resilient against attempts by unauthorised third parties to alter their use, outputs or performance by exploiting system vulnerabilities”. There is some element of proportionality in the obligation, which recognises that technical solutions aiming to ensure cyber security of the systems must be appropriate to the risks and circumstances, but it also highlights a series of vulnerabilities the technical solutions should address (where appropriate), including:

data or model poisoning;
model evasion;
confidentiality attacks; and
model flaws.

Broadly speaking, these are all vulnerabilities of adversarial machine learning (i.e. malicious attacks which bad actors may deploy to exploit and attack vulnerabilities in AI systems) but the Act does little to explain them or indicate what may be ‘appropriate’. However, until we see more guidance from Brussels as to how to “prevent, detect, respond to, resolve and control” such attacks on AI systems we may need to look across the pond for answers.

NIST Report on Adversarial Machine Learning

As part of its efforts to support the United States Executive Order on Trustworthy AI (issued in October 2023), in early 2024 the National Institute for Standards and Technology (NIST) released a report (Report) which considers the main types of adversarial machine learning attacks along with how organisations might be able to mitigate against them. That said, as NIST highlights, bad actors can intentionally confuse or poison AI systems causing them to malfunction and there is “no foolproof defence that their developers can employ”.

Types of Attack

Unlike traditional cyber-attacks and security issues, the following four broad types of attacks / vulnerabilities set out in the Report (which the Act requires businesses to be resilient against) are “fairly easy to mount and require minimum knowledge of the AI system and limited adversarial capabilities”:

1. Poisoning attacks: A key concern at the training stage of the development of a new AI system is attempts to introduce ‘poisoned’ (i.e. corrupt) data for the system to learn from. Although poisoning attacks aren't necessarily new to cyber security, the Report highlights that they are considered the “most critical vulnerability of machine learning systems deployed in production” and are likely to be a particular concern for organisations scraping data from the Internet where they may not be able to guarantee the quality of the data used to train the model. For example, a malicious actor may be able to deploy data poisoning on a large scale within the training data set which could then build vulnerabilities into the system which that actor can then exploit at a later date, including once the system becomes operational.

2. Abuse attacks: Abuse attacks are similar to poisoning attacks insofar as they both involve the system being trained on or learning from false or corrupt data. The key difference with an abuse attack is that the source of the false information may be “legitimate but compromised”, i.e. the attacker has inserted incorrect information into a legitimate source, such as a website, to repurpose the intended use of the system for their own purposes. This could be used by an attacker, for example, to exploit vulnerabilities in a large language model so that the model then shares disinformation or hate speech selected by that attacker.

3. Privacy attacks: A fundamental element of AI systems is the data - which often includes personal data - on which the system was trained. However, once a system is operational it is possible for bad actors to use legitimate means to obtain that personal data. For example, it may be possible for such bad actors to ask a large language model many queries which enable the actor to reverse engineer personal data about a particular individual in the aggregate data set. Equally, the same techniques can be utilised to gain access to the proprietary or confidential information relating to the AI system's architecture, enabling attackers to extract sufficient information about an AI system to reconstruct a model which can achieve similar levels of performance.

4. Evasion attacks: Once an AI system is fully operational, evasion attacks can be used to make small changes to the input provided to the system which causes the system to then misclassify that input. For example, attackers may exploit an AI driven spam filter by incorporating the spam into an image in order to avoid detection by the AI system's text analysis capabilities. A more dangerous example would be if someone tampered with a road sign causing a driverless car to misinterpret the sign, e.g. by thinking a stop sign is actually a sign indicating a speed limit which could then lead to accidents.

The Report continues, in far more granular detail, to address the intricacies of these types of attacks - e.g. considering the differences in an attacker's goals/objectives in different types of attack - as well as setting out some of the technical measures which may be utilised to mitigate them which will provide useful context for organisations to understand what measures they may require to meet their obligations under Article 15. However, as the Report also examines, there are still a number of ‘open’ challenges which organisations must be taking into account in the design or deployment of an AI system, such as the ‘trade-offs’ organisations must assess, prioritise, and accept (e.g. a system which optimises adversarial robustness may result in the system being less accurate or limit the scope for explainability and fairness).

In any event, the authors of the Report recognise that the existing defences are “incomplete at best” and encourage the AI community to devise better options, hence why they expect to update the Report as new attack and mitigation capabilities develop. Nevertheless with vague obligations to be resilient against such attacks under the AI Act, the NIST report inevitably provides a reasonable albeit not exhaustive place for organisations to begin when developing their approach to AI related cyber security and will likely continue to be a useful resource as future iterations continue to provide the latest security practices and measures to defend against attacks as they become viable.

But what can businesses actually do?

Evidentially, quality of data and a degree of sophistication in the AI models and technical defences are going to be key to, as far as possible, protect a system from attack and meet the new obligations under the AI Act. Whilst the Report highlights that there is no guaranteed way to “protect AI from misdirection” the variety of technical options included to mitigate the attacks are useful tools which should be considered.

However, as the press release for the Report notes, the data sets these systems are being trained on are “too large for people to successfully monitor and filter”, meaning another key strategy in minimising the likelihood of attacks such as poisoning will be to ensure your organisation knows the origin of the data. Moreover, whilst it may not assist in complying with Article 15, if the data sets are not collated by your organisation, it may also be worthwhile ensuring you have some contractual assurances (and indemnities, where appropriate) as to its origin and the lawfulness of its collection/processing.

Similarly, whilst AI offers huge opportunities and will inevitably continue to reach new capabilities in the coming weeks, months, and years, it's not yet reached a level of sophistication which can fully withstand evasion or privacy attacks (for example) which only require carefully considered misdirection from a bad actor. In lieu of a system or technical measures which can effectively resist all such attacks, it will also be fundamental to ensure that you have an effective and demonstrable system of AI governance in place which, among many other things (such as measures to mitigate algorithmic bias/discrimination, broader organisational measures, etc.), includes regular testing of the system at all stages of the AI lifecycle, suitable due diligence when adopting a third party providers' AI system, and clearly documenting the processes, analysis, and decisions made in relation to all AI systems. Such a system of governance may not fully protect an organisation against the attacks but will enable you to demonstrate the ‘appropriate’ measures your organisation has taken which are required under the AI Act.

For more on the requirements under the EU AI Act, join us for our next In House Data Club at 4.00pm on 30 April where the team will discuss the Act, the obligations, and what organisations should be doing now, (and if you can’t make it drop us a line to view the recording) or reach out to your usual Lewis Silkin contact.

NIST Report on Adversarial Machine Learning

Types of Attack

But what can businesses actually do?

Authors