The digital era is our reality, in which the integration of artificial intelligence is gaining ever greater momentum.
Are you planning to develop an AI model to optimize business processes and services? Among a number of legal questions, the protection of personal data that will be used for the development and implementation of such models arises.
The European Data Protection Board (“EDPB”) published an opinion on the protection of personal data in the process of developing and using AI models and how the process of their training and application is regulated by the provisions of the GDPR.
Let us examine 3 questions described in the opinion and the answers to them from the perspective of an AI model developer or owner.
General Recommendations for Compliance with GDPR Provisions
Within the framework of personal data protection, the process of creating an AI model can be divided into two main stages: data processing at the development and training stage, and data processing during the practical application of such a model.
First of all, before the implementation of an AI model project begins, it is necessary to determine:
- on the basis of what information this model will be trained and will operate;
- which party will act as controller/processor/what the relationships are between joint controllers; and
- to distribute responsibilities between the parties.
Given the complexity of the technologies involved, and with the aim of complying with the principle of transparency, information about the processing of personal data in AI models must be communicated to the user in an accessible, understandable, and convenient form.
In accordance with the principle of data minimization, “the purpose of collection must be clearly and specifically defined” by the controller(s).
The controller(s) must describe in detail information:
- about the type of AI model being developed;
- the context of implementation (whether the model is being developed for internal use, or whether the controller intends to sell or distribute the model to third parties after its development, including whether the model is primarily intended for deployment for research or commercial purposes);
- the purpose of implementation (where possible);
- the expected functional capabilities of the AI model;
- any other relevant context already known at this stage.
The Question of AI Model Anonymity. GDPR Compliance
The anonymity of an AI model places the project outside the scope of GDPR regulation. But the question arises of how exactly to determine that your model is anonymous.
The specificity of AI models is that they typically do not contain records that can be directly isolated or linked for anonymization, but instead contain parameters representing probable relationships between data.
Even if an AI model was not intentionally created to provide information about a specific individual, information from the training dataset, including personal data, may remain “absorbed” in the model’s parameters. These parameters may differ from the original data points of the training dataset, but may still retain the original information from that data, which may ultimately be extracted or derived, directly or indirectly, from the model.
Given the complexity of the technology, the European Data Protection Board considers that if an AI model trained using personal data is to be assessed for anonymity — namely the assessment of the relevant and effective measures implemented by the controller to ensure and demonstrate the anonymity of such a model — this assessment is conducted on an individual basis in each separate case.
If, however, the AI model was specifically created to provide personal data about the individuals whose data was used to train the model, or to make such data available in some way, such an AI model cannot be considered anonymous at all.
This concerns, for example, a generative model specifically fine-tuned on a person’s voice recordings to imitate their voice; or any model created to provide personal data from training data in response to a request about a specific individual.
Criteria for the Anonymity of an AI Model
The European Data Protection Board has identified the following criteria for the possible anonymous nature of an AI model:
- A negligible probability of direct extraction of personal data regarding individuals whose data was used to train the model.
- A negligible probability of obtaining such personal data, intentionally or otherwise, through queries to the model.
To determine the “level of probability,” the AI model needs to be evaluated separately at the development and application stages, taking into account “all means that might reasonably be used” by the controller or another person, and also taking into account the possible unintentional (re-)use or making the model publicly available.
For such an assessment it is necessary to take into account:
- the characteristics of the training data itself and the training procedures;
- the context in which the AI model is released and/or processed;
- additional information enabling identification that may be available to such a person;
- the risk of identification both by the controller and by various “other persons” who may gain access to the AI model;
- the available technologies at the time of processing, as well as technological developments.
EDPB Assessment Elements and “Risk Reduction Measures”
Additionally, the European Data Protection Board identifies the following aspects for assessing the anonymity of an AI model, which may at the same time serve as “risk reduction measures” during the legitimate interest assessment test:
- the design of the AI model, namely:
- Selection of sources used for training the AI model (appropriateness, relevance, pertinence of source selection, etc.);
- Preparation of data for the training stage:
- Whether the use of anonymous and/or pseudonymized personal data was considered (if it was decided not to use such measures, the reasons for such a decision, taking into account the intended purposes);
- Data minimization strategies and methods applied to limit the volume of personal data included in the training process; and
- Any data filtering processes applied prior to model training to remove irrelevant personal data.
- The choice of reliable methods in developing the AI model: privacy-enhancing techniques, etc.;
- Methods or measures added to the AI model itself that may not affect the risk of direct extraction of personal data from the model, but which may reduce the likelihood of obtaining personal data related to training data through queries.
- the conformity of the AI model design with the developed plan and the presence of effective management of engineering processes;
- the existence of any documented audits (internal or external) including an assessment of the selected measures and their impact on limiting the probability of identification (analysis of code review reports; theoretical analysis documenting the relevance of the selected measures for reducing the probability of re-identification of the relevant model);
- the volume, frequency, number, and quality of structured testing against exfiltration, model inversion, reconstruction attacks, etc.;
- the existence of the required prepared documentation (documented data processing operations, DPO advice and feedback, etc.).
The Question of Legitimate Interest as a Legal Basis When Processing Personal Data for Training an AI Model During Development and Use
To determine legitimate interest as a legal basis for processing personal data at the development or application stage of an AI model, the controller, as a general rule, conducts a three-step test to assess the legitimacy of such interest in each individual case.
Three cumulative conditions must be assessed and documented accordingly:
- the existence of a legitimate interest;
- the necessity of processing personal data to achieve the purpose of the legitimate interest;
- the balance between the rights of data subjects and the interests of the controller.
Existence of a Legitimate Interest
- Confirmation of the existence of a legitimate interest encompasses three aspects:
- the interest is lawful;
- the interest is clearly and precisely formulated;
- the interest is real and present, not speculative.
In the context of AI models, the European Data Protection Board provides three examples of legitimate interest: developing a conversational agent service to assist users, developing an AI system to detect fraudulent content or behavior, and improving threat detection in information systems.
Necessity of Data Processing to Achieve the Purpose of the Legitimate Interest
- The necessity assessment (necessity test) includes two elements:
- whether the processing makes it possible to achieve the purpose;
- whether a less intrusive way of achieving this purpose does not exist.
When assessing necessity, it is recommended to pay attention both to the volume of personal data processed and its proportionality for achieving the legitimate interest, and to the broader context of the envisaged personal data processing (whether the controller has a direct relationship with data subjects or not (third-party data)).
If the purpose can also be achieved through an AI model that does not involve the processing of personal data, then the processing of personal data must be considered unnecessary.
The implementation of technical security measures to protect personal data can contribute to the fulfillment of the necessity test. Such measures are indicated above — “risk reduction measures.” In this way, anonymization is not achieved, but the probability that data subjects will be identified is reduced.
Balance Between the Rights of Data Subjects and the Interests of the Controller
- The balancing test includes a detailed description and assessment of:
- on one hand: the interests, rights, and freedoms of data subjects, the impact of personal data processing on such subjects, the status of data subjects and their reasonable expectations;
- on the other hand: the interests of the controller or a third party.
The interests of data subjects are those that may be affected as a result of the processing.
In the context of the AI model development stage, these interests may include:
- interest in self-determination;
- interest in maintaining control over one’s own personal data (data collected for model development).
In the context of the AI model use stage, interests may include:
- interests in maintaining control over one’s own personal data;
- financial interests (the AI model is used to generate income or is used by an individual in the context of their professional activities);
- personal benefits (the AI model improves access to certain services);
- socio-economic interests (the AI model provides access to better healthcare).
Among the risks to the rights and freedoms of data subjects in the development and implementation of AI models, the following can be identified:
- rights to private and family life;
- rights to the protection of personal data;
- rights to freedom of expression;
- rights of the individual to work, etc.
For example: large-scale and irresponsible collection of data by models at the development stage may create a sense of surveillance for data subjects, particularly given the difficulties in preventing the scraping of public data. This may lead to self-censorship and pose risks to freedom of expression.
When an AI model is used to block the publication of content by data subjects, there is a risk to freedom of expression. In addition, an AI model that recommends hostile content to vulnerable individuals may create risks for their mental health.
When job applications are pre-screened using an AI model, there is always a risk of adverse consequences for the individual’s right to work.
In cases where an AI model discriminates against users based on certain personal characteristics of nationality or gender, there is a risk of violating the prohibition of discrimination.
The European Data Protection Board also notes the possible positive impact of AI models on data subjects, including facilitating access to information, access to education, and supporting the right to mental integrity, etc.
The impact of processing on data subjects depends on the nature of the data processed by AI models, the context of processing, and the subsequent consequences that this processing may have.
For example: financial data or location data must be treated as potentially having a serious impact on data subjects. The use of web scraping at the development stage may lead, in the absence of sufficient protective measures, to significant consequences for individuals due to the large volume of data collected, the large number of data subjects, and the irresponsible collection of personal data.
Importantly, when assessing the consequences, the controller must assess what technical and organizational measures have been taken to avoid potential risks and the circumstances of the specific situation.

For example, for generative AI models, this may include introducing restrictions to avoid using such models for harmful practices: creating deepfakes, chatbots used for disinformation, phishing and other types of fraud, and manipulative AI agents.
The reasonable expectations of data subjects play a key role in conducting the test. It is therefore important for the controller to take into account the broader context of processing, which will include:
- the nature of the relationship between the data subject and the controller (whether there is a connection between them);
- the nature of the service, the context in which the personal data was collected;
- the source from which the data was collected (the website or service where the personal data was collected, and the privacy settings they offer);
- the potential further use of the model; and
- whether data subjects are actually aware that their personal data is available on the Internet, etc.
As a result of the detailed description of all criteria of the balancing test, the controller conducts an assessment of whether the interests, rights, and freedoms of data subjects appear to outweigh the legitimate interests pursued by the controller or a third party. In the event that the interests, rights, and freedoms of data subjects outweigh those of the controller, the controller may consider applying “risk reduction measures” for processing on those data subjects.
“Risk reduction measures” are additional to, and distinct from, the basic measures provided for by the GDPR.
What Are the Consequences of Unlawful Processing of Personal Data at the AI Model Development Stage for the Further Life of Such an AI Project?
Case 1
A company that is a personal data controller unlawfully and without a legal basis uses such data to develop and train an AI model. Subsequently, this data, which is not anonymized, remains in the AI model and is processed by the same company — the controller (for example, the practical use of the AI model).
EDPB considerations: in such cases the supervisory authority has the power to impose corrective measures regarding the initial data processing.
Consequences: the unlawfulness of data processing at the development stage (initial processing) will have an impact on subsequent processing. The impact will be determined on the basis of an individual analysis conducted by the supervisory authority, taking into account all the specifics of the particular case.
For example: with regard to the legal basis of Article 6(1)(f) GDPR, where subsequent processing is based on legitimate interest, the fact that the initial processing was unlawful must be taken into account in the assessment of the legitimate interest (for example, regarding the risks to data subjects or the fact that data subjects may not expect such subsequent processing). In such cases, the unlawfulness of processing at the development stage may affect the lawfulness of subsequent processing.
Case 2
A company that is a personal data controller unlawfully and without a legal basis uses such data to develop and train an AI model. Subsequently, this data, which is not anonymized, remains in the AI model. The use of the AI model and subsequent data processing is carried out by another company — controller.
EDPB considerations: the supervisory authority may conduct an investigation of the specific AI model. In such a case, the supervisory authority will separately assess the lawfulness of the initial processing at the development stage and the lawfulness of the processing at the use stage.
The controller that processes data at the AI model use stage must conduct a proper assessment to confirm that the AI model was not developed through unlawful processing of personal data, taking into account whether the data originates from a personal data breach or whether the initial processing was found to be in violation of the GDPR by a supervisory authority or court.
The supervisory authority, based on this assessment, must take into account whether the AI model is the result of a GDPR violation, particularly if this has been established by a supervisory authority or court, which may indicate that the controller could not have been unaware of the unlawfulness of the initial processing.
Consequences: the unlawfulness of data processing at the development stage (initial processing) will have an impact on subsequent processing. The impact will be determined on the basis of an individual analysis conducted by the supervisory authority, taking into account all the specifics of the particular case.
For example: with regard to the legal basis of Article 6(1)(f) GDPR, where subsequent processing is based on legitimate interest: the unlawful initial processing must be taken into account in the assessment of the legitimate interest of processing at the AI model application stage.
Various aspects, both of a technical nature (the existence of filters or access restrictions introduced during model development that the subsequent controller cannot circumvent or influence, and which may prevent access to or disclosure of personal data) and of a legal nature (the nature and seriousness of the unlawfulness of the initial processing), must be properly taken into account within the balancing of interests test.
Case 3
A company that is a personal data controller unlawfully and without a legal basis uses such data to develop and train an AI model. At the same time, before the implementation and use of the AI model by the same controller or another controller, measures are taken to anonymize the data used.
EDPB considerations: at the AI model development stage — the supervisory authority reserves the right to apply corrective measures and to intervene in the initial processing.
Situation 1: the controller can demonstrate that at the AI model use stage no processing of personal data takes place — the provisions of the GDPR do not apply.
Consequences: the unlawfulness of data processing at the development stage will not affect the further use of the AI model.
Situation 2: the controller processes personal data collected at the AI model use stage, after the model has been anonymized — the GDPR will apply.
Consequences: in such cases, pursuant to the GDPR, the lawfulness of the processing carried out at the AI model use stage is not affected by the unlawfulness of the initial processing.
Therefore, the training of AI models requires a careful approach to personal data protection issues. First of all, it is necessary to determine whether the model is anonymous and to justify the legal basis for data processing. This will ensure compliance with legislation, minimize the risks of violating the rights of data subjects, and create a sound legal framework for the model’s operation.
The specialists of Legal IT Group are ready to provide practical recommendations on the compliance of your project with legislative requirements and to help resolve any legal questions. Contact us for a consultation — we will be happy to help!
Сообщение Training an AI Model on Personal Data and GDPR: What Did the EDPB Say? появились сначала на Legal IT group.