On January 22, 2026, House Representatives Madeleine Dean (D-PA) and Nathaniel Moran (R-TX) introduced H.R. 7209, a bipartisan bill that could significantly reshape the relationship between copyright law and artificial intelligence. Known as the Transparency and Responsibility for Artificial Intelligence Networks (TRAIN) Act, the proposal seeks to give copyright owners a clearer path to understanding whether—and how—their works are being used to train generative AI models.
At the heart of the bill is a new administrative subpoena process added to the Copyright Act. Under the TRAIN Act, a copyright owner who has a good-faith belief that their work was used to train a generative AI model could request a subpoena, issued by the clerk of a U.S. district court, compelling an AI developer to disclose copies of training materials or records sufficient to identify them with certainty. The bill applies not just to original models, but also to substantially modified versions, including those retrained or fine-tuned after initial release.
Importantly, rights holders may only seek information about their own copyrighted works, not the broader training datasets used by a developer. To initiate the process, the requester must submit a sworn declaration stating that the subpoena is sought solely to determine whether their copyrighted material was used and that any disclosed records will be used only to protect their rights.
For developers, the obligations are clear: comply expeditiously or face consequences. Failure to comply with a valid subpoena would create a rebuttable presumption that the developer copied the copyrighted work—a notable shift that could affect future infringement litigation. At the same time, the bill includes safeguards against abuse, allowing courts to impose sanctions on rights holders who request subpoenas in bad faith under existing Rule 11 standards.
Supporters of the TRAIN Act frame it as a transparency measure, arguing that copyright owners currently lack practical tools to determine whether their works have been ingested by opaque AI training pipelines. Critics, however, may raise concerns about administrative burden, confidentiality, including exposure of potential trade secrets regarding how a model is trained and the potential chilling effect on AI development.
As debates over AI, data rights, and creative ownership intensify, the TRAIN Act represents one of the most concrete legislative efforts yet to address the “black box” of AI training, and it is likely to spark close attention from creators, tech companies, and courts alike.
Until now, only a handful of states have enacted laws requiring some form of disclosure about AI training data, and they do so with differing scopes and mechanisms:
- California – AB 2013 (Artificial Intelligence Training Data Transparency Act), effective January 1, 2026, requires developers of generative AI systems offered for public use in California to post a high-level summary of their training data on a public website. The summary must describe, at a high level, among other things: data sources and ownership, data characteristics and volume, collection and processing methods, intellectual property status (including the use of copyrighted versus public-domain data), whether personal information is included, relevant time frames, and whether synthetic (AI-generated) data was used.
- Connecticut – An amendment to the Connecticut Data Privacy Act (Public Act No. 25-113), effective July 1, 2026, requires covered “controllers” to disclose in their consumer privacy notices whether they collect, use, or sell personal data for training large language models. The requirement focuses on disclosure rather than detailed dataset inventories and applies even when personal data is used through vendors or other third parties.
- Colorado – The Artificial Intelligence Act requires certain developers of high-risk AI systems to provide deployers with documentation about those systems, including general information about the categories of data used for training, measures taken to mitigate algorithmic discrimination, and known limitations and risks. These disclosures are primarily business-to-business and focused on risk management rather than public transparency.
Unlike these state laws—which rely on generalized disclosures, privacy notices, or risk documentation—the TRAIN Act would create a targeted, rights-holder-driven mechanism to obtain specific information about whether particular copyrighted works were used in AI training.
If enacted, the TRAIN Act could reduce the need for a fragmented, state-by-state approach and provide a broader, more effective path for content owners to determine whether their materials are being used to train AI systems.