(CNN) — The New York Times has sued OpenAI and Microsoft for copyright infringement, alleging that the companies' AI technology illegally copied millions of articles from the newspaper to train ChatGPT and other services to provide information to people, technology that now competes with publishing.
The complaint is the latest in a series of lawsuits seeking to limit the use of the alleged deletion of vast swaths of Internet content (without compensation) to train so-called big-language AI models. Actors, writers, journalists and other creative people who publish their work online fear that artificial intelligence will learn from their material and provide competing chatbots and other sources of information without proper compensation.
But the New York Times' lawsuit is the first among major news publishers to take on OpenAI and Microsoft, the two most well-known AI brands.
In a complaint filed Wednesday, the newspaper said Microsoft and OpenAI's illegal use of the Times' work to create competing artificial intelligence products threatens the Times's ability to provide that service.
The document noted that OpenAI and Microsoft used other sources in their “extensive copy” but “gave the Times special focus on content” seeking to “capitalize on the Times’s massive investment in its journalism by using it to create substitute products without permission.” “Or payment.”
Microsoft and OpenAI did not immediately respond to a request for comment on the lawsuit.
The newspaper said in its complaint that it objected when it discovered months ago that its work had been used to train large language models for companies. The newspaper said that starting in April, it began negotiating with OpenAI and Microsoft to obtain fair compensation and determine the terms of the settlement.
But The Times claims it has been unable to reach a solution with the companies. Microsoft and OpenAI claim that the Times' works qualify as “fair use,” giving them the ability to use copyrighted material for a “transformative purpose,” the complaint states.
“There is nothing transformative about using Times content without compensation to create products that displace The Times and steal its audience,” the newspaper said in its complaint. “Because the outputs of Defendants’ GenAI models compete with and closely mimic the inputs used to train them, the Times’s copying for this purpose is not fair use.”
The newspaper is among several leading newsrooms, including CNN, which earlier this year added code to their websites that prevents OpenAI's web crawler, GTBot, from scanning their platforms for content.
The Times claims that because its AI tools are trained on its content, it can “generate results that read the Times’ content word for word, summarize it in detail, and mimic its expressive style, as dozens of examples demonstrate…” These tools also “misattributed false information to the newspaper.” , stated in the complaint.
The news outlet also claims that Microsoft's Bing search engine, which was updated earlier this year with OpenAI technology, “transcribes and sorts” Times content to produce longer, more detailed answers than traditional search engines.
“By making Times content available without permission or authorization from The Times, Defendants’ tools undermine and harm The Times’ relationship with its readers and deprive The Times of subscriptions, licenses, advertising, and affiliate revenues,” the lawsuit said.
“We recognize the potential of… [la IA generativa] To the public and the press.
“But at the same time, we believe that the success of GenAI and the companies developing it should not come at the expense of news organizations,” according to the memo seen by CNN. “Using our work to create GenAI tools must have permission and an agreement that reflects the fair value of that work, as provided by law.”
Through the lawsuit it filed, the newspaper seeks unspecified financial compensation, in addition to a permanent injunction preventing Microsoft and OpenAI from continuing the alleged violation. The Times also seeks to “destroy” GPT and any other AI model or training set that includes its contents.