Monday 10 July 2023

Sarah Silverman Files Lawsuit Against OpenAI and Meta Accusing Them of Being Highly Skilled Plagiarists'

Enlarge / Comedian and author Sarah Silverman.

On Friday, the Joseph Saveri Law Firm filed US federal class-action lawsuits on behalf of Sarah Silverman and other authors against OpenAI and Meta, accusing the companies of illegally using copyrighted material to train AI language models such as ChatGPT and LLaMA.

Other authors represented include Christopher Golden and Richard Kadrey, and an earlier class-action lawsuit filed by the same firm on June 28 included authors Paul Tremblay and Mona Awad. Each lawsuit alleges violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.

The Joseph Saveri Law Firm is no stranger to press-friendly legal action against generative AI. In November 2022, the same firm filed suit over GitHub Copilot for alleged copyright violations. In January 2023, the same legal group repeated that formula with a class-action lawsuit against Stability AI, Midjourney, and DeviantArt over AI image generators. The GitHub lawsuit was terminated in December 2022 when a court order shows that plaintiffs stopped responding. Procedural maneuvering in the Stable Diffusion lawsuit is still underway with no clear outcome yet.

In a press release last month, the law firm described ChatGPT and LLaMA as “industrial-strength plagiarists that violate the rights of book authors.” Authors and publishers have been reaching out to the law firm since March 2023, lawyers Joseph Saveri and Matthew Butterick wrote, because authors “are concerned” about these AI tools’ “uncanny ability to generate text similar to that found in copyrighted textual materials, including thousands of books.”

The most recent lawsuits from Silverman, Golden, and Kadrey were filed in a US district court in San Francisco. Authors have demanded jury trials in each case and are seeking permanent injunctive relief that could force Meta and OpenAI to make changes to their AI tools.

Meta declined Ars’ request to comment. OpenAI did not immediately respond to Ars’ request to comment.

A spokesperson for the Saveri Law Firm sent Ars a statement, saying, “If this alleged behavior is allowed to continue, these models will eventually replace the authors whose stolen works power these AI products with whom they are competing. This novel suit represents a larger fight for preserving ownership rights for all artists and other creators.”

Accused of using “flagrantly illegal” datasets

Neither Meta nor OpenAI has fully disclosed what’s in the datasets used to train LLaMA and ChatGPT. But lawyers for authors suing say they have deduced the likely data sources from clues in statements and papers released by the companies or related researchers. Authors have accused both OpenAI and Meta of using training datasets that contained copyrighted materials distributed without authors’ or publishers’ consent, including by downloading works from some of the largest e-book pirate sites.

In the OpenAI lawsuit, authors alleged that based on OpenAI disclosures, ChatGPT appeared to have been trained on 294,000 books allegedly downloaded from “notorious ‘shadow library’ websites like Library Genesis (aka LibGen), Z-Library (aka Bok), Sci-Hub, and Bibliotik.” Meta has disclosed that LLaMA was trained on part of a dataset called ThePile, which the other lawsuit alleged includes “all of Bibliotik,” and amounts to 196,640 books.

On top of allegedly accessing copyrighted works through shadow libraries, OpenAI is also accused of using a “controversial dataset” called BookCorpus.

BookCorpus, the OpenAI lawsuit said, “was assembled in 2015 by a team of AI researchers for the purpose of training language models.” This research team allegedly “copied the books from a website called Smashwords that hosts self-published novels, that are available to readers at no cost.” These novels, however, are still under copyright and allegedly “were copied into the BookCorpus dataset without consent, credit, or compensation to the authors.”

Ars could not immediately reach the BookCorpus researchers or Smashwords for comment.

Editor Notes

It’s interesting to see the ongoing legal battles between AI companies and authors over the use of copyrighted material. While AI language models offer incredible capabilities, it’s crucial to strike the right balance in respecting intellectual property rights. This case highlights the importance of preserving ownership rights for all artists and creators, and it will be fascinating to see how the lawsuits unfold. Stay tuned for more updates on AI-related legal disputes by checking out GPT News Room.

Source link



from GPT News Room https://ift.tt/RNE3xsG

No comments:

Post a Comment

語言AI模型自稱為中國國籍,中研院成立風險研究小組對其進行審查【熱門話題】-20231012

Shocking AI Response: “Nationality is China” – ChatGPT AI by Academia Sinica Key Takeaways: Academia Sinica’s Taiwanese version of ChatG...