Wall Street Journal & NY Post Sue Bezos-Backed AI Startup Perplexity
Dow Jones & Co., parent of the Wall Street Journal, and the New York Post, both owned by News Corp., has sued the Jeff Bezos-backed generative AI startup Perplexity for copyright infringement, the latest in a stream of lawsuits around artificial intelligence that have met with varying degrees of success.
Perplexity “claims to provide its users accurate and up-to-date news and information in a platform that, in Perplexity’s own words, allows users to ‘Skip the Links’ to original publishers’ websites,” says the lawsuit, filed Monday in filed in federal court in the Southern District of New York. “Perplexity attempts to accomplish this by engaging in a massive amount of illegal copying of publishers’ copyrighted works and diverting customers and critical revenues away from those copyright holders. This suit is brought by news publishers who seek redress for Perplexity’s brazen scheme to compete for readers while simultaneously freeriding on the valuable content the publishers produce.”
The New York Times recently sent Perplexity a “cease and desist notice” to stop accessing the publication’s content, according to reports. It previously sued OpenAI. Separately, Sarah Silverman and a group of high-profile authors sued OpenAI and Meta in 2023 over copyright infringement concerns that their work and books have been illegally downloaded and used to train the company’s large language model AI software. A judge dismissed part of the Open AI case that alleges unfair business practices. The suit against Meta is proceeding. Other publishers have sued, as have visual artists, even as AI companies are now among the most valuable in the world.
Generative AI systems generate content that mimics natural language in response to a prompt using large language models. LLMs are “trained” on large amounts of content that enable them successfully to assemble sentences and paragraphs for a reader to comprehend.
“News articles, analysis, and editorials serve as very useful content in this training due to, among other things, their clarity and structure, the editing and quality control they receive, their wide range of topics, their range of perspectives, the recency of their information, their writing style, and their tone,” explained the suit.
These “outputs” are machine-generated reproductions of human-created content arranged by LLMs and other tools that summarize and paraphrase original, human-generated content, even at times reproducing that content verbatim – which, the suit says, is substitution, not fair use. That’s a legal doctrine that allows for the use of copyrighted material without the owner’s permission under certain conditions, for instance if the work is “transformed” in the process and not substantially similar to the original.
The suit today says, “the illegality of this massive copyright violation at the input stage does not depend on whether the particular outputs of Perplexity’s so-called “answer engine” are sufficiently similar in each instance to the copyrighted works of Plaintiffs as to constitute identical reproductions of those works. It is sufficient that Perplexity makes copies of Plaintiffs’ works on a grand scale to create reproductions and/or derivative content that is designed as a substitute for Plaintiffs’ works.”
Licensing deals are possible in lieu of lawsuits although different publishers have approached that differently. News Corp. the parent of both publications, recently partnered with ChatGPT creator OpenAI to license its content for certain uses in OpenAI’s applications.
The News Corp. companies point out the obvious. Its articles rely on the effort, talent, skills, and experience of accomplished journalists, editors, and other professional staff. “Undermining the financial incentives to create original content will result in less content being generated and/or less quality content being generated, which will also reduce the amount of content available to power AI.”
“Perplexity’s business is fundamentally distinct from that of traditional search engines that also copy a vast amount of content into their indices but do so merely to provide links to the originating sites. In its traditional form, a search engine is a tool for discovery, pointing searchers to websites such as the pages of The Wall Street Journal or the New York Post, where the users can click to find the information and answers they seek. Those clicks in turn provide revenue for content producers.”
The suit says Perplexity also harms plaintiffs’ brands “by falsely attributing to Plaintiffs certain content that Plaintiffs never wrote or published.” Those outputs are apparently called “hallucinations” in AI circles.
“Perplexity’s hallucinations can falsely attribute facts and analysis to content producers like Plaintiffs, sometimes citing an incorrect source, and other times simply inventing and attributing to Plaintiffs fabricated news stories.”
The suit alleges trademark infringement as well.
The News Corp. publishers said they sent a letter to Perplexity in July putting it on notice of the legal issues raised by unauthorized use of copyrighted works and offering to discuss a potential licensing deal, but “Perplexity did not bother to respond.”
The suit is seeking a jury trial. It asks the court to enjoin any further use of plaintiffs’ content without authorization and wants said content removed from Perplexity’s search results, databases and archives. It asks in part for damages of up to $150k for each copyright infringement; statutory damages, up to and including three times actual damages, actual damages, and Perplexity’s profits for each violation.
Source link