The rapid development in the field of artificial intelligence (AI), especially in generative AI systems such as ChatGPT, raises many questions regarding topics such as transparency, fairness, data accuracy and data subject rights. This primarily concerns the providers of AI systems, but also the companies that use them.
LLM and GDPR
In response to the rapid proliferation of AI applications, the ChatGPT Task Force of the European Data Protection Board (EDPB) published a preliminary report on compliance with EU data protection rules for large language models (LLMs), such as those used by ChatGPT (GPT-3, GPT-4, etc.). The EDPB report is intended to help ensure that the General Data Protection Regulation (GDPR) is applied in conjunction with such language models.
We provide a brief overview of the ChatGPT task force report and summarise the key points of the opinion.
Why is the EDPB also concerned with AI?
The EDPB is an independent body that coordinates national data protection authorities so that data protection rules are applied consistently within the EU. It also publishes guidelines and opinions on a range of data protection issues to serve as recommendations for national authorities and companies.
To verify compliance with the GDPR for language models like ChatGPT’s, the EDPB formed a ChatGPT task force. The task force drafted a questionnaire for the exchange between the supervisory authorities and OpenAI OpCo LLC, the provider of ChatGPT, in order to achieve a coordinated approach to the investigations.
In fact, some national data protection authorities have already commented on the matter. For example, the German BfDI has already made suggestions for regulating generative AI and the French CNIL has published guidelines for using AI in a way that complies with data protection regulations.
Preliminary statement of the ChatGPT task force
Since the considerations presented below do not represent a conclusive analysis by the ChatGPT task force, it is only possible to give a preliminary view on the topics mentioned above. It must be noted that the circumstances of the investigations may change over time.
Lawfulness of the processing
ChatGPT uses a large amount of personal data from various publicly available sources on the internet. This process is known as web scraping and involves several methods of automatically collecting data from the internet. The data collected in this way is used to train ChatGPT, to continuously improve it and thus to further develop it.
In this regard, OpenAI relies on its legitimate interest according to Art. 6 (1) lit. f) GDPR. The legal basis has not yet been conclusively examined by the task force. It is noteworthy thought that the Dutch supervisory authority has previously indicated that the practice of web scraping is hard to bring in line with data protection requirements and the reliance on the legal basis of Art. 6 (1) lit. f) GDPR may be insufficient.
OpenAI also relies on its legitimate interest in processing with regard to the use of ChatGPT itself (inputs, outputs, user feedback). The ChatGPT task force emphasises that the question of whether users are sufficiently informed about the use of such data for further developing the model will play a significant role in its – still pending – audit of the legal basis.
Fairness
Personal data must be processed lawfully, transparently and in good faith in accordance with Art. 5 (1) lit. a) GDPR. This means that processing must not be unjustifiably disadvantageous, discriminatory, unexpected or misleading. An important aspect of this principle is that the risks of data processing must not be transferred to the data subjects (i.e. the users). OpenAI remains responsible for compliance with the GDPR and, in the case of ChatGPT, must not hold users responsible for their chat inputs, for example through clauses in the general terms and conditions.
Since ChatGPT is publicly available, it must be assumed that users will enter personal data. If these entries (prompts) become part of the model because of their use for continuous training and, for example, are shared with people who ask a specific question, OpenAI is responsible for compliance with the GDPR in this context. OpenAI cannot claim that entering personal data was prohibited in any way.
Transparency and information requirements
In addition, the requirements of Art. 14 GDPR must be observed when web scraping, i.e. extracting personal data from publicly accessible sources. Since web scraping involves collecting large amounts of data, it would be difficult to inform each data subject individually. In this regard, provided that all requirements are met, the exception under Art. 14 (5) (b) GDPR could apply, according to which the obligation to provide information is limited if providing the information proves impossible or would require a disproportionate effort.
However, if personal data is collected through direct interaction with ChatGPT, the requirements of Art. 13 GDPR apply. In this case, it is important to also inform users that their input may be used to train the model.
Accuracy of the data
According to Art. 5 para. 1 lit. d) GDPR, all personal data collected must be accurate and up to date. The problem with this is that ChatGPT does not necessarily generate factually correct information due to its probabilistic nature. The information generated by ChatGPT may also be biased or invented. Nevertheless, the outputs provided by ChatGPT are often considered factually correct by end users, regardless of their actual accuracy.
OpenAI, as the data controller, has the duty to provide information regarding the functioning of ChatGPT and the limited reliability of its outputs. This does not preclude the need for further measures to comply with the principle of data accuracy.
Rights of data subjects
Given the complex processing situation, it is important for ChatGPT that data subjects can exercise their rights in a simple manner. OpenAI allows data subjects to exercise some rights via email, while others can be exercised via the account settings.
OpenAI is encouraged to continue taking appropriate measures to effectively implement the data protection principles so that the requirements of the GDPR are met, thus protecting the rights of data subjects.
Conclusion
While AI-based tools such as ChatGPT offer many benefits to users, they must still comply with the provisions of the GDPR. The preliminary statement of the ChatGPT task force emphasises the scope of the responsibilities and obligations that come with operating a large language model. In particular, areas of transparency, fairness, data accuracy, and the data protection rights of data subjects pose significant challenges. Adherence to and promotion of these principles are crucial to ensure privacy protection and compliance with data protection requirements.
It will be interesting to see how the EDPB task force continues to address ChatGPT and whether the exchange between the authorities and OpenAI brings about further developments.
Companies using ChatGPT would be well advised to take note of the statement and address the risks associated with using ChatGPT. Although the audit focuses on OpenAI’s obligations, many of the points raised are also at least indirectly relevant for companies using ChatGPT or other AI-based tools.
We therefore recommend that you read our guide to GDPR compliance for personal data processing involving AI.