AI-Powered Chatbots: Mythical Super Creature or Legal Trojan Horse

By Sari Depreeuw & Yung Shin Van Der Sype on June 6, 2023

Ever since the public launch of OpenAI’s ChatGPT, the world has been gasping at the astonishing accomplishments of this generative AI chatbot: a simple “prompt” in the form of a question (“which are the most important decisions of the CJEU in copyright?”) will receive a credible response within seconds (“The Court of Justice of the European Union (CJEU) has issued several important decisions in the field of copyright law. While it is challenging to determine a definitive list of the most important decisions, here are some key rulings that have had significant impact” and it goes on to list some of the CJEU’s most well know decisions, such as Infopaq, UsedSoft, Svensson, Deckmyn, ACI Adam, GS Media and YouTube).

Impressive for sure and, although the information is not always reliable (ChatGPT has been reported to invent legal precedents, to the embarrassment of the lawyers who have submitted briefs on that basis…), companies recognise the appeal of AI-powered chatbots – they are here to stay. To avoid reeling in these applications as legal Trojan horses, in-house counsel do well to identify the legal risk of this new technology: racial, sexual, and other bias that may induce discriminatory acts and misinformation are well documented and important hurdles to the widespread adoption of AI solutions in a corporate environment. In this post, we will, however, address some of the concerns relating to copyright, trade secrets and the protection of personal data.

Copyright and the protection of trade secrets may complicate the AI applications in different ways: the use of “input” data and the “output” of the AI solution.

The algorithms of the AI solution are “trained” using datasets that may contain content protected under copyright or related rights (such as performances, recordings, or databases). Similarly, such protected content may be present in the prompts that the user submits to the AI-powered solution. Keeping in mind the broad interpretation that the CJEU has given to the reproduction right, the copies made of these datasets may be seen as “reproductions” and consequently require the prior authorisation from the author and holders of related rights – unless the use is covered under one of the (harmonised) legal exceptions.

Under the Information Society Directive N° 2001/29, the exceptions for temporary acts of reproduction or the research exception may have exempted some uses, but these provisions were considered insufficient to create the legal certainty required to stimulate the development of innovative technologies, such as AI. With the Copyright in the Digital Single Market Directive N° 2019/790 (“DSM Dir”), two new exceptions were introduced for “text and data mining” (“TDM”), i.e. “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations” (art. 1(2) DSM Dir). Text and data mining is permitted with the right holders’ prior consent in two cases:

TDM for scientific research (art. 3 DSM Dir): a research organisation (art. 2(1) DSM Dir) or cultural heritage institution (art. 2(3) DSM Dir) may reproduce or extract the protected content in a TDM process, for the purpose of scientific research under this exception – provided that they have “lawful access”.
TDM for other purposes (art. 4 DSM Dir): other users may reproduce or extract protected lawfully accessible works (including software) and other content in a TDM process, for other purposes – provided that the rightholders have not “reserved” the use of the works or other subject matter “in an appropriate manner, such as machine-readable means in the case of content made publicly available online”.

These exceptions have been transposed in the Belgian Code of Economic Law (art. 190, 20° and 191/1, 7° CEL). Important challenges will remain: especially the modalities for the “opt-out” of the general TDM exception that the different rightholders may exercise are not standardised (yet) and, as the TDM exceptions will be implemented in 27 member states, there may be national variations. In addition, authors and performers may enjoy moral rights in the member states, which are not harmonised under the DSM Directive.

In the meanwhile, technical responses to this web-wide crawling are being developed (such as Have I been trained?) to find out whether a particular file has been used. Some AI providers are proposing mechanisms to give authors some control over the use of their works (e.g. Stability AI) – but it is uncertain whether they suffice to comply with article 4 DSM Dir.

While the protection of trade secrets is arguably less of an issue when an AI solution is trained using publicly accessible datasets, this may be an issue where employees include their employer’s confidential information in the “prompts” they submit to an AI chatbot.

Under the Trade Secrets Directive N° 2016/943 (“TS Dir”), the acquisition, use or disclosure of a trade secret without the consent of the trade secret holder is unlawful (art. 4 TS Dir). Logically, the provider of the AI-powered chatbot is likely to put all responsibility for the prompts on the user’s side (e.g. OpenAI requests users not to submit any sensitive information and to permit the use of all input content to provide and maintain the service in its terms of use). It is then for the user to make sure that their prompts contain no trade secrets or confidential information of their employer or of third parties that their employer holds under a confidentiality agreement.

While the mere transfer of sensitive information to a service provider is unlikely to affect the secret nature of the “trade secret”, it may go against the confidentiality policy or violate the conditions of a confidentiality agreement with a supplier, a client or a partner, as a copy of the confidential information will be out of the trade secret holder’s control (i.e. stored on the servers of the AI provider).

As to the AI-generated output, it may be infringing copyright if the original traits of protected work can be recognised in the output. In most cases, however, the AI-creations imitate the style of a musician, a painter or a photographer. Elements of style are however considered “ideas” and consequently not protected under copyright. By contrast, where the AI-output imitates the voice or other features of singers or actors, the latter may rely upon their personality rights and their image rights to oppose the use of their appearance.

Lastly, the AI-generated output may itself be protected under various rights. While traditional copyright typically requires the creative input of a human author and will not be available for AI-productions without human intervention (regardless of questions of evidence), such requirement is absent under the related rights – in particular the protection of phonograms or first fixations of films. This means that no author can control the reuse of AI-generated output on the basis of copyright, but the producers of AI-produced audio- or audiovisual content may have the right to prohibit the reproduction or communication to the public or to conclude licences for their productions.

Another important legal concern is the protection of personal data. As organisations increasingly turn to AI-powered chatbots to enhance operations and customer experiences, data protection issues have come to the forefront. Notably, the Italian Data Protection Authority identified sufficient violations of the GDPR to temporarily ban the use of ChatGPT in Italy. However, after addressing the concerns raised by the Italian Authority, OpenAI reactivated access again at the end of April 2023. In the same vein, the European Data Protection Board established a dedicated task force to address data protection concerns related to ChatGPT. These actions underscore the importance of considering data protection when deploying AI chatbots.

AI chatbots process personal data during the training of AI models and in the real-time interactions with users. One key concern is the need for organisations to establish a valid legal basis for processing personal data, which can include, for example, consent, legitimate interest, or contractual obligations. Another requirement is transparency of the data processing: organisations need to provide easily understandable information to data subjects, clearly explaining how their personal data is processed within AI-powered chatbot systems. In addition to these core concerns, other issues may arise relating to the need for age verification systems to mitigate risks associated with inappropriate interactions with minors, as well as the implementation of robust security measures to protect personal data from data breaches and unauthorised access.

The data protection analysis will depend on the precise technical features that AI-chatbot organisations will actually deploy. ChatGPT, for instance, offers various usage scenarios, including the use of the web version, API use by developers, and the recently introduced ChatGPT plugins. Each scenario has different implications for data protection and different roles and responsibilities of the involved actors.

The first scenario covers the regular use of the web version of ChatGPT. In this case, the chatbot is used in the way it was developed and it is offered by OpenAI. For this web version, OpenAI and the users are the primary actors. OpenAI acts as a controller for both training the models and processing user requests. However, organisations using ChatGPT in their workflow need to be cautious about potentially processing personal data, the more as it is retained by default for training purposes. Compliance with data protection regulations becomes crucial in this context.

The second scenario involves API users, i.e. developers. An API user will get an OpenAI API key, and with this key, the API user will be able to gain additional control over the AI model. API users can refine the ChatGPT models based on their own needs and they can train the models to function either as a standalone model or they can integrate ChatGPT in their own products. In this case, developers act as controllers for the processing of personal data. OpenAI provides a data processing addendum to API users, qualifying itself as a processor. However, this qualification may raise questions due to the control exerted by OpenAI.

The third scenario concerns ChatGPT plugins, which enable access to third-party knowledge sources and databases. The plugin functionality allows ChatGPT to consider third-party information from these sources in its generated responses. In this case, according to OpenAI, both the third-party creator of the plugin and OpenAI act as separate and independent controllers of personal data. Also in this case, this qualification may raise questions, and further examination by the task force set up by the European Data Protection Board is eagerly anticipated.

Some takeaways for organisations that care to assess some of the legal risks resulting from the use of AI-powered tools in a professional context:

It is important to raise awareness within AI-using organisations, i.e. among their company lawyers, employees, freelancers and other partners, and to assess whether a company policy would be useful. A non-representative poll during the IBJ webinar of 5 May 2023 indicated that AI-powered chatbots are already commonly being used in a professional context (50% of the respondents confirmed such use) and that a minority has a policy in place (3% stated that their organisation prohibits the use of AI-tools, 19% permits the use within certain limits, and 77% has no policy at all).

Where AI-using organisations establish a policy on the use of AI-tools for professional purposes, they may consider the following points. Developers of AI-solutions may use all web-accessible content to train their algorithms. Organisations that do not wish their content to be used for these purposes may look into technical, organisational and contractual means of reserving their rights, facing TDM processes (e.g. by finetuning the robot.txt instructions or other metadata). Especially for creators of high value content, such as broadcasters, music producers or newspaper publishers, may want to look into the appropriate expression of their opt-out under all available (copyright or related) rights. Furthermore, organisations ought to assess the risk that their employees, freelancers or other partners transmit copyright protected content or confidential information (belonging to the organisation or to third parties) in a prompt to the AI-powered tool and, if useful, address such risks in clear guidelines. Where important confidential information is at stake, it may be worth revising confidentiality clauses in contracts with third parties (partners, suppliers, customers) to whom such information is disclosed, to explicitly prohibit the use of third-party AI tools without explicit contractual guarantees. Where the organisation intends to use the AI-generated output in any way that requires some sort of exclusivity, it ought to verify whether they can exercise any statutory exclusive rights (such as producers’ rights) and, where applicable, settle such rights with the AI-provider. Where no such statutory rights exist, they may want to organise the contractual protection to control the use of the AI-generated output.

Also from a data protection perspective, AI-using organisations should ensure that they have the necessary contractual arrangements in place, for example a data processing agreement or another data protection agreement with the AI chatbot provider. This agreement should clearly outline the responsibilities of both parties and stipulate that the provider complies with all applicable data protection laws, including the GDPR. If there are any international data transfers the organisation should make sure that the transfer relies on a valid transfer mechanism and that the necessary transfer protocols are in place. Prior to this, it is recommended that AI-using organisations conduct a data protection impact assessment and if needed, a transfer impact assessment, before allowing for the use of AI chatbots in their organisation. It may be needed to refine internal rules on the use of personal data in order to establish guidelines for the proper use of AI-powered chatbots by employees, including rules against sharing of personal data, particularly sensitive and special categories of personal data, through the chatbots.

AI-developers, on the other hand, must be wary of the expressions of rightholders who wish to reserve their rights to TDM and must proactively check whether any instructions are given in code or elsewhere. In their terms and conditions, they should clearly indicate how rightholders’ and users’ content (in prompts or otherwise) will be used, so they have a sufficient authorisation to operate their AI-driven solutions. Ideally they also indicate more explicitly for which purposes the users’ input is used (“performing the service”, “improving the service”), how long the content will be stored and whether the user (or their organisation) can request the erasure of the content.

AI-developers also need to consider data protection and are encouraged to conduct a data protection impact assessment for the development and provision of AI-powered tools. Especially when training new models, whenever possible, AI-developers could use anonymisation techniques on data before feeding it into the chatbot for training purposes. In general, AI-developers could adhere to the principle of data minimisation, using only the necessary categories and amount of personal data for model refinement or development. Next to many other requirements, transparency is also crucial, and data subjects should be informed about the use of their personal data in data protection notices.

If you would like learn more about the subject and to stay informed about recent legal developments, you are invited to the Crowell & Moring Legal Knowledge Library – Crowell Hub 💡. This free portal has been designed specifically to support in-house counsel. Please click here to login or register.