Just how much of your data is in danger on ChatGPT Playground?
Are we on the brink of a landmark decision in the world of artificial intelligence? That’s the question everyone’s asking as comedian and author Sarah Silverman and two fellow authors, Christopher Golden and Richard Kadrey, square up against tech giants OpenAI and Mark Zuckerberg’s Meta. The accusation? Allegedly training their AI models on the authors’ work without proper authorization.
Just how much of a playground has OpenAI created with your data? Is it a playground that you can reclaim? Let’s see if the playground of copyright law has to say.
At the heart of the issue are tools like ChatGPT, OpenAI’s super smart chatbot. To train this and similar AI models, developers feed them a smorgasbord of data gleaned from the internet. This information serves as a foundation for the models to respond convincingly to text prompts from users.
However, Silverman, Golden, and Kadrey argue that their copyrighted books were used as training material for ChatGPT without their consent. A similar argument is launched against Meta, accusing the company of using the authors’ works to train their own AI models, LLaMA.
According to the lawsuits, the accused tech companies acquired the authors’ work from “shadow library” sites, often frequented by the AI-training community. The claims include exhibits, suggesting that OpenAI’s tool was able to summarize three books: Silverman’s ‘The Bedwetter’, Golden’s ‘Ararat’, and Kadrey’s ‘Sandman Slim’.
The case against Meta calls out several works from the aforementioned authors, including references to ‘The Bedwetter’, highlighting a Meta document pointing to LLaMA’s training datasets containing material from these contentious shadow libraries.
Lawyers Joseph Saveri and Matthew Butterick, representing the trio, have voiced the rising concern among writers, authors, and publishers regarding ChatGPT’s ability to generate text eerily similar to copyrighted material.
But Silverman, Golden, and Kadrey aren’t the only ones crying foul. Saveri and Butterick are also representing authors Mona Awad and Paul Tremblay in a separate class action lawsuit against OpenAI. They claim their work was similarly used without permission for training ChatGPT.
The image generator Stable Diffusion is also facing legal action from Getty Images for an alleged breach of copyright. The same lawyers are standing up for artists Sarah Andersen, Kelly McKernan, and Karla Ortiz in their case against image generators Stability AI, Midjourney, and DeviantArt.
Trust or hallucination
Lawsuits are even targeting AI’s potential for falsehoods, or as insiders call them, “hallucinations”. OpenAI is in the crosshairs again as a Georgia-based radio host sues for defamation following a false fraud accusation issued by the AI model. The legal skirmishes underline a critical question: how can we balance the development and utility of AI with respect for copyright laws?
To win their case, these authors will need to demonstrate likely economic loss. After all, copyright protection doesn’t extend to ideas, only to written expression. Yet, they contend that their novels were used without permission for training OpenAI, which allegedly produced accurate summaries of their works when prompted.
The authors are particularly concerned about “shadow libraries”, which they claim illegally distribute thousands of copyrighted works. They reference a 2020 OpenAI paper revealing 15% of their training dataset comes from two internet-based books corpora.
True litmus test
Their battle, however, isn’t without hurdles. They’ll need to prove that OpenAI most likely copied their works. The trickiest part? Demonstrating the probability of economic loss.
In this pioneering clash between copyrights and AI training, the real threat, according to the authors, is that OpenAI might replicate some of the things human authors can do. While the copying act itself may not inflict significant harm, the potential for AI to mimic human authors raises alarms. As we eagerly await the outcomes of these lawsuits, we’re left pondering: what will the future hold for copyright protection in the age of AI?