Editor’s Note: In a move that could reshape the boundaries of AI‑powered search and content reuse, the “traditional media vs. AI startups” battle has entered the courtroom in force. The New York Times (NYT) and Chicago Tribune have filed parallel copyright and trademark lawsuits against Perplexity AI, accusing it of unlawful scraping and repurposing of their journalism. The lawsuits claim Perplexity’s AI systems have generated outputs “verbatim or substantially similar” to protected content — even when behind paywalls — and, at times, falsely attributed AI‑generated content to publishers, undermining journalistic integrity and revenue models.
For cybersecurity, information-governance, and eDiscovery professionals, this case underscores a broader reckoning: as generative AI tools proliferate, the legal and compliance frameworks governing data ingestion, content citation, and intellectual property must evolve rapidly. The court’s decision could set crucial precedents — either validating aggressive data‑scraping models or compelling AI firms to obtain explicit permissions and licensing before incorporating proprietary content. The stakes for media companies, AI developers, and anyone dealing with digital content rights have never been higher.
Industry – Artificial Intelligence Beat
New York Courts Become Pressure Chamber for AI as NYT and Tribune Sue Perplexity
ComplexDiscovery Staff
Courts in New York have suddenly become a pressure chamber for the future relationship between artificial intelligence and the news business. The latest spark is a pair of lawsuits that put Perplexity AI in the crosshairs of two of America’s most influential newspapers, with direct consequences for how organizations govern data, manage legal risk, and trust AI-driven answers online.
The New York Times and the Chicago Tribune each filed separate complaints in federal court in early December, accusing Perplexity of building its “answer engine” on unlicensed copies of their journalism and then using those outputs in ways that can substitute for the publishers’ own websites. The Times alleges that Perplexity engaged in large‑scale copying and display of millions of its articles, including paywalled stories, to power commercial products, after the newspaper spent roughly 18 months raising concerns and requesting that Perplexity stop using its content without an agreement. The Tribune, for its part, claims that Perplexity reproduces extensive portions of its reporting—sometimes nearly verbatim—inside chat-style answers and a newer search interface, undercutting subscription and advertising models that depend on direct traffic to the paper’s properties. For cybersecurity, information governance, and eDiscovery teams, these allegations go beyond media drama; they map directly onto core questions about lawful data use, provenance, and the evidentiary reliability of AI-generated material.
Two cases, one common theme
In the Times’ complaint, the publisher asserts that Perplexity scraped both paywalled and free articles, then used them as inputs to train and operate services that compete with the newspaper’s own offerings. The filing stresses not only unauthorized copying but also output behavior, arguing that Perplexity can reproduce “identical or substantially similar” text from Times stories and sometimes attribute fabricated information to the Times while displaying its trademarks and branding. That combination of alleged copyright infringement and false attribution raises dual exposure under copyright law and the Lanham Act, giving the case added weight for any organization that allows users to rely on AI-generated answers branded as if they came from trusted sources.
The Chicago Tribune lawsuit mirrors many of those themes while focusing on the way Perplexity’s retrieval‑augmented generation architecture is said to ingest and republish Tribune content. Tribune Publishing alleges that Perplexity’s tools display long passages of its reporting in answers, thereby disincentivizing clicks to the original site and siphoning off subscription, licensing, and affiliate revenue. The complaint also flags reputational harm, arguing that Perplexity’s system, like other large language models, can hallucinate and then misattribute incorrect statements to the Tribune, potentially damaging its standing as a reliable news source. For security and governance professionals, this is a practical reminder to test how internal and external AI tools handle both paywalled data and attribution before rolling them into client‑facing workflows; a short pilot with real content and close output review can surface issues before they become public problems.
Part of a broader legal wave
Perplexity’s clash with the Times and Tribune does not occur in isolation. Dow Jones—which owns The Wall Street Journal and the New York Post—filed its own complaint in October 2024, accusing Perplexity of a “massive illegal copying” scheme that allegedly diverts critical revenue by offering AI‑generated answers instead of pushing users to publisher sites. That case framed Perplexity’s business model as relying on large‑scale scraping and reuse of copyrighted journalism, echoing rhetoric now surfacing in the new lawsuits. These matters sit alongside a growing cluster of U.S. cases against generative AI developers, including the Times’ earlier action against OpenAI and Microsoft over the use of millions of Times articles to train foundation models without permission.
Industry tallies now place the number of generative AI training‑data lawsuits in the United States at well over 40, spanning claims by book authors, visual artists, music labels, and multiple news organizations. For eDiscovery practitioners, that emerging docket is already reshaping how litigation holds, document collections, and privilege reviews account for AI training corpora and model outputs. For many eDiscovery teams, a practical takeaway is that matter‑level scoping exercises should explicitly ask whether internal or vendor models have ingested third‑party content under disputed terms, and if so, document the basis for reliance—or non‑reliance—on those tools in contentious use cases. Even a simple internal memo explaining why a particular AI system will not be used on a given dispute can reduce confusion later when discovery battles begin.
Fair use, false attribution, and technical design
At the heart of these disputes lies the still‑unsettled question of whether training and running generative models on copyrighted material can qualify as fair use, particularly when outputs can closely track the originals or serve as functional substitutes, a question courts have only begun to address in fact‑specific, early decisions that do not resolve these cases. AI companies often argue that ingestion is transformative, comparable to how search engines index and preview web pages, and that their systems generate new text based on patterns rather than storing articles in a human‑readable database, though courts have not yet ruled broadly on whether that analogy holds for generative models using news archives. The Times and Tribune complaints attempt to narrow that analogy, emphasizing detailed reproductions, paywall circumvention, and product designs that allegedly channel users away from publisher sites.
The Times case adds a notable twist by spotlighting hallucinations that the complaint says were falsely presented as Times reporting, alongside the paper’s name and trademarks. That aspect squarely intersects with information governance and cyber risk: when an AI tool misattributes content, organizations face not only factual inaccuracy but potential defamation, brand dilution, and regulatory scrutiny around deceptive or misleading statements. Governance and security experts generally recommend mapping the AI ecosystem used inside the organization, classifying tools by their training‑data posture (licensed, opt‑out honored, or unknown), and aligning that inventory with acceptable‑use policies that restrict deployment in high‑sensitivity contexts. One actionable step is to incorporate attribution audits into AI risk assessments—sampling outputs, checking whether citations track to real underlying material, and documenting remediation plans when they do not.
Business models, licensing, and negotiation leverage
While the lawsuits showcase aggressive litigation, they also highlight a parallel path: licensing. The Times has already entered a multiyear deal with Amazon that authorizes the use of its content to train certain AI models, a move seen as both a new revenue stream and a way to exert more control over how its journalism is used. That dual strategy—suing some AI developers while partnering with others—illustrates how large publishers are experimenting with leverage, signaling to technology firms that access to high‑value archives will come with contractual guardrails rather than open scraping.
Smaller outlets, by contrast, may lack the scale to negotiate rich licensing deals or sustain lengthy federal litigation. The Tribune’s complaint underscores that tension by tying alleged infringement directly to lost digital subscriptions and licensing opportunities, suggesting that unsanctioned AI reuse can have immediate bottom‑line effects on newsrooms already operating on tight margins. From a procurement and governance perspective, one simple step is to ask vendors to disclose which publishers they have licensing arrangements with—and to treat vague answers as a governance red flag that may warrant contractual restrictions, indemnities, or even alternative solutions. Even adding two or three targeted questions about training data and licensing to vendor questionnaires can materially improve your visibility into these risks.
Why this matters to cyber, IG, and eDiscovery
For cybersecurity teams, these lawsuits reinforce that AI supply chains are not just technical but legal. If a tool depends on questionable scraping practices or generates hallucinated content under respected brands, it can introduce reputational, regulatory, and even phishing‑related risks by confusing users about what is trustworthy.In practice, that means inventorying AI tools, rating them by legal and data‑sourcing risk, and limiting high‑risk systems to low‑impact use cases until their posture improves. For information governance leaders, the Perplexity cases highlight the value of clear robots.txt strategies, API access controls, and contract terms that limit how partners can mine organizational data for AI training. Embedding copyright and licensing checkpoints into data‑sharing workflows can help avoid becoming either an unwilling training source or an unwitting infringer when internal teams build models using third‑party material.
Meanwhile, eDiscovery professionals are beginning to see AI training and usage records show up as key evidence in disputes like those facing Perplexity. Logs reflecting scraping behavior, access to paywalled content, and internal discussions about fair use or licensing will likely be central in discovery, privilege debates, and expert analysis. A concrete practice tip is to treat AI product‑development documentation as potentially discoverable from the outset—maintaining disciplined recordkeeping that can either support defenses or, at a minimum, reduce chaos when litigation arrives. As the cases against Perplexity unfold—and as courts wrestle with fair use, false attribution, and the economics of licensing—organizations across sectors will have to decide how comfortable they are building on top of AI tools whose legal footing remains contested and may evolve as rulings accumulate.
For professionals charged with securing systems, governing data, and managing discovery, the question is no longer abstract: how much legal and operational risk are you willing to absorb in exchange for AI‑powered convenience?
News Sources
- AI News Roundup – McDonnell Boehnen Hulbert & Berghoff LLP (JDSupra)
- New York Times Sues Perplexity AI in Latest IP Case Against GenAI Companies (IPWatchdog)
- New York Times Sues A.I. Start-Up Perplexity Over Use of Copyrighted Work (The New York Times)
- Chicago Tribune sues Perplexity (TechCrunch)
- Chicago Tribune sues Perplexity AI for copyright infringement (Chicago Tribune)
Assisted by GAI and LLM Technologies
Additional Reading
- From Brand Guidelines to Brand Guardrails: Leadership’s New AI Responsibility
- The Agentic State: A Global Framework for Secure and Accountable AI-Powered Government
- Cyberocracy and the Efficiency Paradox: Why Democratic Design is the Smartest AI Strategy for Government
- The European Union’s Strategic AI Shift: Fostering Sovereignty and Innovation
Source: ComplexDiscovery OÜ

The post New York Courts Become Pressure Chamber for AI as NYT and Tribune Sue Perplexity appeared first on ComplexDiscovery.