Authors Sue OpenAI Over Unauthorized Use of Their Novels to Train ChatGPT
OpenAI, the company behind the popular AI chatbot ChatGPT, has been hit with a lawsuit by bestselling novelists Mona Awad and Paul Tremblay. In their proposed class action, Awad and Tremblay allege that OpenAI used their copyrighted novels to train the AI chatbot without their consent.
ChatGPT is powered by large language models that extract vast amounts of text to generate natural and lifelike responses. When prompted, the chatbot provided detailed summaries of Tremblay’s “The Cabin at the End of the World” and Awad’s “Bunny” and “13 Ways of Looking at a Fat Girl.” These responses serve as evidence that their novels were indeed used to train ChatGPT.
The lawsuit claims that OpenAI used copyrighted materials, including books by Awad and Tremblay, without obtaining proper consent or providing credit and compensation. It highlights the significant role that books play in training language models due to their high-quality longform writing.
In 2018, OpenAI admitted to training its GPT-1 model using a dataset called BookCorpus, which consists of over 7,000 unpublished books. The dataset was obtained from a website called Smashwords.com, where the novels, though freely available to readers, are still under copyright.
Subsequent iterations of OpenAI’s language models, including GPT-3, used even larger quantities of copyrighted books. OpenAI’s paper on GPT-3 revealed that 15% of the training data came from two book corpora referred to as “Books1” and “Books2.” The suit estimates that Books1 contains approximately 63,000 titles and Books2 includes around 294,000 titles.
The lawsuit argues that OpenAI’s language models are infringing derivative works since they rely on the expressive information extracted from the plaintiffs’ novels, among others. According to the suit, this infringement violates the plaintiffs’ exclusive rights under the Copyright Act.
Broader Class-Action Suit Accuses OpenAI of Unauthorized Data Collection
In addition to Awad and Tremblay’s lawsuit, a class-action suit has been filed by Clarkson, a public-interest law firm, on behalf of twelve anonymous clients. This suit accuses OpenAI of collecting private and sometimes identifying information from internet users without their knowledge or informed consent.
Experts predict that more lawsuits of this nature will emerge as AI technology continues to advance and utilize information from the web to create new content.
Editor Notes
This lawsuit raises important questions about the ethical use of copyrighted material in artificial intelligence training. While AI has the potential to revolutionize numerous industries, it is crucial to ensure that proper consent, credit, and compensation are given to creators whose works are used to train AI models.
To stay updated on the latest developments in AI and its impact on various fields, visit GPT News Room.
from GPT News Room https://ift.tt/MBzVdoL
No comments:
Post a Comment