NYT vs Perplexity: The Copyright Infringement Battle Defining AI Search
- Ethan Carter

- Dec 7, 2025
- 6 min read

The legal cease-fire between big tech and big media officially ended in December 2025. While previous lawsuits focused on how large language models (LLMs) were trained, the filing of NYT vs Perplexity marks a shift toward how AI delivers information. The New York Times is suing Perplexity for copyright infringement, alleging that the AI search engine isn't just learning from their journalism—it’s actively repackaging and reselling it to users who would otherwise pay for a subscription.
This isn't just about money. It is a fundamental clash over the architecture of the modern web. If the Times wins, the "answer engine" model could collapse. If Perplexity wins, the traditional concept of a media paywall might become obsolete.
The Core of the NYT vs Perplexity Copyright Infringement Case

At the heart of the complaint filed in the Southern District of New York is a specific accusation: copyright infringement via substitutive competition. The Times argues that Perplexity’s core product isn't a search engine that refers traffic, but a publishing tool that strips value.
When a user asks Perplexity about a breaking news event, the AI doesn't just provide a link. It reads the article—often one sitting behind a strict paywall—and generates a comprehensive summary. The Times alleges this summary is often so detailed that it serves as a perfect substitute for the original work.
Why Paywall Bypassing Is Central to the Claim
The lawsuit leans heavily on the mechanics of paywall bypassing. Most news organizations use a robots.txt file or similar protocols to tell web crawlers which parts of their site are off-limits. The Times claims Perplexity ignored these standard web protocols, using "stealth crawlers" to access content that wasn't meant to be free.
This distinction is critical. In earlier lawsuits against OpenAI or Microsoft, the defense often argued that scraping the open web for training data was "fair use" because the resulting model was transformative—it created something new. Here, the NYT vs Perplexity copyright infringement argument is sharper. The Times asserts that if an AI takes a locked article and regurgitates its key points to a user who didn't pay, that isn't transformation. It’s piracy disguised as innovation.
Perplexity has previously argued that it cites sources and provides traffic, but the Times’ legal filing suggests the click-through rate is negligible compared to the volume of content consumed directly on the AI platform.
The Technical Battleground: Retrieval-Augmented Generation

To understand why this lawsuit is different from previous IP battles, you have to look at the technology. Perplexity relies heavily on Retrieval-Augmented Generation (RAG).
Unlike a standard chatbot that relies on a frozen dataset of training info, RAG allows the AI to go out to the live internet, fetch current data, and generate an answer based on what it finds right now. This is Perplexity's superpower, but in the context of copyright infringement, it might be its kryptonite.
How RAG Complicates the Fair Use Defense
In a standard copyright infringement case involving AI, the defense usually claims the model "learned" from the data. RAG changes the metaphor. It’s not just learning; it’s looking up and copying.
When Perplexity answers a query using RAG, it retrieves specific sentences or paragraphs from the source text to construct its answer. The Times has provided exhibits showing the AI reproducing large chunks of text verbatim or near-verbatim. This weakens the Fair Use Defense.
The courts look at four factors for fair use, and the "effect on the potential market" is usually the most important. If Perplexity uses RAG to deliver the Times' reporting for free, directly impacting subscription revenue, the fair use defense becomes much harder to justify. The technology that makes Perplexity useful—its ability to be factual and current—is exactly what exposes it to this liability.
Public Sentiment: Legacy Media Obsolescence vs. Theft

The reaction to the NYT vs Perplexity lawsuit has been polarized, particularly in tech-focused communities like Reddit. The discourse reveals a deep divide between those who view information as a public utility and those who view journalism as a product that requires funding.
Many users see the lawsuit as a symptom of legacy media obsolescence. The argument is that the Times is fighting the inevitable evolution of information consumption. Just as the telegraph disrupted the pony express, AI answer engines are disrupting the ad-supported or subscription-based article format. Users express frustration with the current state of the web—cluttered with ads, pop-ups, and SEO-bait—and view Perplexity as a superior user experience that the Times refuses to adapt to.
The Problem of Substitutive Competition
On the other side of the debate, even tech-savvy users admit there is a real issue with substitutive competition. The concern is ecological: if the AI eats the revenue of the content creator, the content creator dies. If the Times goes bankrupt, Perplexity loses the source of its high-quality data.
Commenters have pointed out the "Ouroboros" problem. If substitutive competition is allowed to run unchecked, AI will eventually just be scraping other AI-generated content, leading to model collapse. Supporters of the lawsuit argue that content licensing is the only sustainable path forward. Perplexity has launched a revenue-sharing program for publishers, but the Times evidently decided the terms were insufficient or the existential threat was too great to settle for a licensing fee.
Hallucinations and Brand Damage
A secondary but vital part of the NYT vs Perplexity copyright infringement suit involves trademark dilution and reputation management. The Times cites instances where Perplexity hallucinated—made up fake information—and attributed it to the New York Times.
This moves the complaint beyond copyright. If a user asks, "What did the NYT say about X?" and Perplexity provides a fabricated answer citing the Times, it damages the newspaper's credibility. In the era of disinformation, the accuracy of the brand is the product.
This creates a paradox for Perplexity. To avoid copyright infringement, they need to summarize less and paraphrase more. But to avoid hallucinations and maintain accuracy, they need to stick closer to the source text. They are caught between legal liability and product utility.
The Future of Content Licensing
Regardless of the verdict, the NYT vs Perplexity case will likely force a standardization of content licensing. We are moving toward a bifurcated web. One web will be open, filled with marketing content and personal blogs that want to be scraped for exposure. The other web, containing high-value investigative journalism and specialized knowledge, will be locked behind increasingly hard barriers that AI crawlers cannot penetrate without a cryptographic key provided via a licensing deal.
If the court rules that Retrieval-Augmented Generation constitutes infringement, Perplexity and similar engines will have to pivot. They may become premium aggregators, where your Perplexity subscription essentially bundles subscriptions to the Times, the Wall Street Journal, and other partners.
If the court rules in favor of Perplexity, validating the Fair Use Defense for RAG, the legacy media obsolescence timeline accelerates. Publishers will have to invent entirely new business models that don't rely on text-based distribution, perhaps shifting toward video, live events, or highly gated communities that AI cannot access.
The era of the "open web" where search engines trade traffic for content is effectively over. The new exchange is data for dollars, and the courts are about to set the exchange rate.
FAQ: Understanding the NYT vs Perplexity Copyright Lawsuit
1. What is the main accusation in the NYT vs Perplexity copyright infringement lawsuit?
The New York Times alleges that Perplexity AI illegally scrapes its copyrighted content, including paywalled articles, to generate summaries that serve as a substitute for the original journalism. The Times claims this copyright infringement siphons off potential subscribers and revenue.
2. How is this lawsuit different from the one against OpenAI?
The OpenAI lawsuit focused largely on the use of data for training models over time. The NYT vs Perplexity case focuses on Retrieval-Augmented Generation (RAG), where the AI accesses and displays live content in real-time to answer user queries, which the Times argues is a more direct form of competition.
3. What is the "Fair Use Defense" in this context?
Perplexity will likely argue that its use of the articles is "transformative" because it summarizes and organizes information for the user, falling under fair use. However, the Times argues that because the output serves the same purpose as the original article (delivering the news), it is substitutive competition rather than transformation.
4. Did Perplexity try to pay the New York Times?
Perplexity has a "Publishers’ Program" that offers revenue sharing for content licensing, and they have signed deals with other outlets like TIME and Fortune. However, the New York Times evidently found the proposed terms or the model itself unacceptable, leading to the lawsuit.
5. What happens if the New York Times wins?
If the court rules that RAG-based summaries constitute copyright infringement, AI search engines may be forced to stop summarizing copyrighted news or pay significant licensing fees. This could lead to a subscription-heavy model for AI search tools where access to premium news sources costs extra.
6. Why are users discussing "legacy media obsolescence"?
Many users on platforms like Reddit believe the lawsuit is a defensive move by a dying industry. They argue that legacy media obsolescence is inevitable due to changing user habits, and that suing technology companies is a temporary fix for a broken business model that relies on restricting information.


