Peculiarities of DPIAs for AI system development and enhancement

By Nils Loelfing on December 6, 2021

Artificial Intelligence (“AI”) is seen as a key emerging technology that the European Parliament in its Draft Report on AI in the digital age (issued in November 2021) recently labelled as the fifth element after air, earth, water and fire. AI is expected to contribute more than EUR 11 billion to the global economy, which is close to China´s GDP generated in 2020. Its importance for businesses that want to benefit from AI related products or services for innovation purposes and to stay competitive is immense. Such importance will increase in the coming years.

In this light, more and more companies will become engaged in developing and building AI systems but also in using already deployed AI systems, which is continuously being enhanced. Therefore, potentially all companies will need to deal with the underlying legal issues to ensure accountability for AI systems sooner or later. One of these accountability requirements will often be the need to conduct a Data Protection Impact Assessment (“DPIA”) under the GDPR.

These DPIAs for AI systems deviate from similar assessments relating to the development and deployment of common software, which results from some peculiarities lying in the inherent nature of AI systems and how they work, described in this article. Such DPIAs will have to appropriately deal with these peculiarities and reflect them. This is of utmost importance to be accountable and meet the respective obligations under GDPR. The main points to consider are summarized below.

1) DPIAs for AI system development/enhancement and AI system deployment for productive use must be distinguished

It is first important to note that the development, enhancement and deployment of AI systems for productive use, are distinct in their circumstances and risks, which may have an impact on the applicable legal basis. This can be illustrated, by the fact that training AI models attempting to develop or enhance AI systems, only indirectly affects data subjects (their data is used for training the algorithm), whereas the deployment and use of a specific AI model may produce legal issues concerning data subjects or similarly significantly affecting them (e.g., where CVs of candidates are rejected based on the historical data fed into an algorithm). This strongly influences the risk assessment for such activities.

The current article only deals with DPIAs for the development and enhancement of AI systems by training the underlying algorithms. It addresses businesses who build AI systems, enhance AI related products or services, including those that purchased AI systems from providers who further train and update the deployed algorithm as part of the product. The latter is potentially relevant to any company that has purchased connected AI devices or services which are continuously enhanced or updated by providers while in use.

In contrast, those who ultimately deploy and productively use AI systems, pursue different purposes than training the AI system for the development/enhancement of new or better innovative products and services. This is a completely different ball game, as their impact assessment must look at what the productive AI system aims to achieve vis-à-vis the then concerned data subjects (e.g., the mentioned candidates who are rejected), which is subject to a separate assessment.

2) What are the peculiarities of AI systems and what is their relevance from a GDPR perspective

Definition and characteristics of AI systems

There is no one single definition of AI, and it may prove difficult to catch future developments properly with one such definition. The European Commission recently tried to provide such a definition in its proposal for an AI Regulation (see Art. 3(1) of said draft Regulation) and suggests a broad approach which includes a wide variety of systems with the consequence that it also catches applications that are not truly AI systems. While this approach provides flexibility, it does also catch systems which do not create unique AI related risks to individuals. However, only systems which create unique AI related risks to individuals should be regulated by said Regulation.

A more appropriate approach to catching the essential characteristics of AI should be technology-neutral, focussing on systems that process big data volumes with the goal of resembling intelligent behaviour by using methods of reasoning, learning, perception, prediction, planning or control. These systems work autonomously but differing in degree and makes use of Machine Learning (“ML”) techniques, as the most prominent arm of AI today. ML means that through the application of statistical methods, algorithms are trained to make classifications, predictions and uncover important insights. The outcome of the learning by the algorithms based on the input data will be an AI model. This approach to defining AI, reflects the UNESCO recommendations on the ethics of AI which was recently published on 24th November 2021.

Supervised and unsupervised AI system development

The AI product development processing activities can generally be based on supervised or unsupervised ML. Supervised learning uses labelled input and output data to train an AI system using training data until the expected result is delivered (a defined purpose must therefore already be available during the training phase). To the contrary, an unsupervised learning algorithm tries to figure out correlations between various data points on its own. This includes the specific use case which is not defined but is supposed to be discovered only through the findings of the unsupervised ML algorithm.

Issues resulting from these peculiarities from a GDPR perspective

The above characteristics of AI systems results in a couple of peculiarities from a GDPR perspective which need to be considered when assessing the impact of the development and/or enhancement of AI systems. For example, the fact that big data volumes must be used to properly train algorithms raises necessity and data minimization issues, as well as the question of whether these big data volumes are unbiased (or “statistically accurate”) and if the data use is limited in purpose to unsupervised ML. The autonomy of AI systems poses further questions on the security of such systems. These and other peculiarities will be discussed below.

3) Need to conduct DPIAs for AI system development/enhancement

In line with the risk-based approach embodied by the GDPR, carrying out a DPIA is not mandatory for every processing operation, but only for high-risk processing activities in relation to personal data (Art. 35 GDPR). Where AI is used and personal data is processed (given the broad definition of personal data under GDPR which is often be the case), a DPIA will need to be carried out to address the risks inherent in AI systems (ensuring non-discrimination, fairness, equity and safety).

DPIAs for AI systems can also be conducted voluntarily, regardless of whether a DPIA is strictly required to “include societal and ethical considerations of their use and an innovative use of the privacy by design approach”, which the UNESCO recommendations on the ethics of AI recently recommended.

Controller may also voluntarily decide to conduct DPIAs, as an appropriate measure to strengthen their accountability regarding the developing or enhancing of AI systems, safeguarding the data subject´s rights. This may ultimately help to also win customer trust and maintain a competitive edge.

4) Peculiarities of DPIAs for AI system development/enhancement and how to deal with them

The most important aspects of DPIAs for AI system development/enhancement are:

Controllership: AI system enhancements, in particular pertaining to connect AI devices or services which are continuously enhanced or updated while in use, raise the question of who is responsible for the AI system enhancement – the provider of the AI system or the customer that deployed it. There is no uniform answer to this and the specific set-up how this is done (e.g. for all or only one customer) must be diligently reviewed against the Guidelines 07/2020 on the concepts of controller and processor in the GDPR (adopted in July 2021) of the European Data Protection Board and the accompanying so far three judgments of the European Court of Justice on joint controllership (Wirtschaftsakademie Schleswig-Holstein, Jehova´s witnesses and Fashion ID). If the AI system enhancement is specifically tailored to particular customers, it will be difficult to completely disprove the customer´s responsibility for this, from a GDPR perspective. A customer may therefore be required to conduct a DPIA for the AI system enhancement under certain circumstances.
Purpose limitation: The development or enhancement of AI systems needs to pursue a specific purpose, which must be defined beforehand. For unsupervised learning, the purpose is not clear but can only be defined while the development proceeds and the algorithm covers correlations in the data. German data protection regulators (in their position paper on recommended technical and organizational measures for the development and operation of AI systems which is accessible in German here) acknowledged this principle and underscored that “clustering” (which is the recognition of patterns and division of the data into clusters or categories), is important to make successful predictions based on as few but relevant variables. Based on these insights resulting from the clustering, data sets can then be cleaned, supplemented and standardised, with which the trainings of AI systems can be optimised for the desired purpose. It is a step-by-step approach which should be documented.
Purpose alteration: Considering the sheer volume of data controllers required for proper AI system development, it is likely that any existing data sets which have originally been collected for other purposes, shall be used. Altering the purpose of collected data needs to be reviewed against Art. 6(4) GDPR, which obliges a controller to conduct a compatibility assessment. Whether a controller requires an additional legal basis, such as if the compatibility assessment was successful or whether a failed compatibility assessment can be overcome with another legal basis is disputed. This seems odd at first sight given the intuitive argument that a compatibility assessment should privilege further use of data already collected, and not hinder it. In any case there are good reasons to follow the “privilege argument”, which should be properly documented in such a DPIA.
Necessity: Depending on the purpose of the AI system development, it must be ensured that the learned model only contains or can reproduce the minimum necessary personal data for training, in order to achieve a specified goal. For example, an AI system whose purpose is to distinguish humans from other objects, must not also be able to identify persons from the training data set. Controllers need to assess whether there are other reasonable and less intrusive ways to achieve the same product, for AI development purposes. A controller should be able to document and argue in a DPIA reasons for: (i) AI techniques were required at all, (ii) that the same outcome cannot be achieved with synthetic data, and (iii) all data points are in fact needed for the development or enhancement.
Statistical accuracy: Risks of bias and discrimination must be properly addressed to ensure the so-called statistical accuracy. Although typically not intended, prejudices easily find their way into the training data. For example, if healthcare data of male patients was preferably collected, an algorithm may recognise higher disease patterns in female patients. To avoid this, staff must be well-trained to select data fairly and to recognize such imbalances as early as possible once the training started. Best practices are starting to develop here.
Data minimization: This supplements the principle of necessity. Though large amounts of data is needed to properly train AI algorithms, only the minimum amount of data required to achieve the defined purposes may be processed. This includes the assessment of whether privacy enhancing technologies like anonymization, pseudonymization and federated learning can be implemented, as a means to reduce the impact of the development/enhancement on the data subjects. There are many different approaches which may be suitable for the AI system in question, but in any case, the controller should try to apply an “innovative use of the privacy by design approach” (see above the UNESCO guidelines) appropriate for their case, to demonstrate GDPR is being taken seriously.
Transparency: Personal data must be processed in a transparent manner in relation to the data subject, such as the information contained in a privacy notice. If the data has not been obtained from the data subject but from a third party (e.g., data sharing co-operations or publicly available information), controllers may be exempt from providing information on the AI development/enhancement. This requires controllers to document that such information proves impossible or would involve a disproportionate effort (Art. 14(5) GDPR). The bar to meet this is relatively high but it does offer at least some room to justify it adequately. The European Data Protection Board argues that these exceptions do not apply to the original controller who wants to process the data of data subjects for its secondary purposes (i.e. secondary use by the same controller) and who is required to inform the data subjects about the change of purpose pursuant to Art. 13(3) GDPR. This applies – according to guidance by the European Data Protection Board – even if the data subjects can no longer be reached or can only be reached with great difficulty and/or if this would require a disproportionate effort. This is counter intuitive as this scenario represents a similar situation, if the data would have been obtained from third parties instead of the data subject. Hence, limitations appear in these cases by way of an analogous application of Art. 14(5) to Art. 13 GDPR scenarios (i.e., where the original controller further processes the data it has collected for its secondary purposes).
Individual rights: Data subjects have certain rights in relation to their data (like access or deletion) and any controller must generally be able to meet those requests from data subjects. Deleting data may be very difficult in big data volumes typically used for training AI systems. A controller may find relief in Art. 11(2) GDPR, if they can demonstrate that they are not able to identify the data subject. This will typically be true with anonymized or pseudonymized data and shows another benefit of the data minimization being taken seriously, provided the goal of the AI training permits the use of anonymized or pseudonymized data only.
Data security risk assessment: Risk identification for AI models is challenging. Compared with traditional IT systems, systems using AI face security risks that go beyond those that may typically impact a traditional IT environment. AI systems are more complex (often using third party code), exposed to new security threats and more prone to data mishandling given the sheer volume of data used for training AI systems. This must all be factored in when identifying security risks, by considering most current research of expert bodies (such as the European Union Agency for Cybersecurity Artificial Intelligence Threat Landscape Report, the latest one being published in December 2020, or the publications by the German Federal Office for Information Security like the recommendation on safe, robust and traceable use of AI [in German only]).

5) Conclusion

The trustworthiness and integrity of the life cycle of AI systems is essential to ensure that AI technologies will work for a good cause to keep up the values of humanity, individuals, societies as well as the environment and ecosystems. The DPIAs conducted for the development or enhancement of AI systems is an important element of this, ensuring that the peculiarities of AI systems are properly reflected, whereby the controllers can demonstrate they have built the relevant privacy requirements into their AI system. It may also help to win the customer’s trust and maintain a competitive edge.

It remains to be seen whether conducting DPIAs on a voluntary basis to “include societal and ethical considerations of their use and an innovative use of the privacy by design approach” as recommended by the UNESCO, becomes a trend. For example, the U.S. is not a signatory of the UNESCO, although it is home to the world’s biggest AI companies. The U.S. thus has a lot of influence on how the practice will develop in this respect going forwards and may set counter-trends.