We are talking about so-called ‘generative AI’, such as ChatGPT and Copilot. The data used in these AI tools is scraped on a large scale from the internet by search bots. Using these AI tools can, of course, be very convenient: with Deepl, you can instantly translate your event text into Italian, in Midjourney you create exactly the image you need for a new event, and in Copilot you rewrite the tone of voice of your Instagram post.
For users of these AI tools, it is important that a lot of high-quality and diverse data is included. However, it is often unclear which data has been retrieved by the search bots and whether copyright still applies to this data. This concerns the ‘input side’ of AI tools. Another dilemma is the question of who owns the copyright to AI-generated works. This is also referred to as the ‘output side’ of AI. There are various ways to protect or compensate for your work, such as opting out, making access impossible, data poisoning, or a compensation system. But legislation is lagging behind developments, and there are many gray areas and uncertainties. In this article, we answer the most important questions about AI and copyright.
Can tech companies use everything that is online just like that?
It is highly questionable whether tech companies are allowed to use works that are still under copyright as input for their AI systems or whether they are infringing on the copyright of creators. Under European legislation, data mining for scientific purposes is permitted. For commercial applications, this is also possible under stricter conditions, but copyright holders can indicate through an ‘opt-out’ that their work may not be used.
Outside the EU – where many of the leaders in AI are active – different rules often apply. For example, in the US, scraping data for AI datasets is much easier, and Japan has no restrictions at all for commercial data mining. As a result, activities by tech companies that may infringe on copyright within the EU could be completely legal outside the EU.
How do I know if my data is included in AI datasets?
If you have published something online, whether it is an article, video, or artwork, you can use the tool haveibeentrained.com (opens in new tab) to check if your work is included in AI datasets. This is purely informative: it is not possible to remove work from an existing dataset.
What can you do yourself?
Opt-out: Protect your data from scraping
The Kunstenbond offers tips (opens in new tab) to protect your work from web scrapers, such as adding the tag ‘robots.txt’ (read here how (opens in new tab)) or by placing this text: “© Copyright reserved. No automated text and data mining is permitted on this website.”
Large tech companies are gradually starting to offer options to manage your work. For example, OpenAI (opens in new tab), the company behind ChatGPT, announced the development of a rights manager for creators.
Creators can join collective actions. The International Confederation of Music Publishers recently launched the online portal rightsandai (opens in new tab), where rights holders can protect their work from web scraping. The Federation of Visual Rights recently launched ‘AI Opt Out Now’ (opens in new tab), a new standard for reserving rights for AI training by commercial parties.
Restrict access
The Royal Library has adjusted its terms of use and implemented technical measures to restrict access to KB collections for commercial AI (opens in new tab).
Data poisoning
Nightshade (opens in new tab) and Glaze (opens in new tab) offer a creative solution to the copyright problem in generative AI. These tools make invisible adjustments to the pixels of an image. When these images are included in an AI dataset, the dataset gets confused and can no longer generate logical images. This solution only works when many people apply the tools.
Compensation system
Another solution, which according to a study by Pictoright (opens in new tab) is preferred by many creators, is to compensate the use of works in AI databases. This balances the interests of creators on one hand and the importance of AI developments on the other. For instance, OpenAI pays for including news articles from two international news organizations, and the online platform Reddit sells its data to AI databases.
Who owns the copyright to AI-generated art?
The rapid developments in generative AI raise questions about the copyright of AI-generated works. Traditionally, copyright protection requires human creativity and choices. However, with AI-generated art, there is no human author, and the AI tool produces the work. Therefore, in principle, it is not a copyright-protected work. This is a gray area, as users may still obtain copyright for creative prompts or may edit the outcomes to acquire copyright on the final result.
In the US, the request for copyright protection for AI-generated images in the style of the graphic novel Zarya of the Dawn was rejected. The creator did receive copyright for the prompts, self-written texts, and the work as a whole.
In this article, we answer frequently asked questions about AI and copyright. Because AI developments are moving faster than legislation, not every question has an answer yet, there are gray areas, and legal judgments are still changing. We will keep you updated on new developments!








