Skip to main content

After USCO’s AI training report, ignoring opt-out signals is harder to defend

On May 9, 2025, the U.S. Copyright Office released the pre-publication version of Copyright and Artificial Intelligence, Part 3: Generative AI Training. The report does not endorse either extreme. It does not say all AI training is infringement, and it does not offer a blanket fair use safe harbor. Instead, it pushes the analysis back to the facts that matter: what was copied, how the material was obtained, whether licensing markets exist, and how the use affects rightsholders in practice. For AI companies, that is a meaningful shift. Large-scale scraping is now much harder to frame as a neutral technical step.

The practical takeaway is even sharper. The Office discusses terms of use, robots.txt, metadata, watermarking, and other ways for rightsholders to signal that their material should not be used for AI training. It also notes that voluntary opt-out measures can have merit. Once those signals become more standardized and machine-readable, developers who ignore them, or who rely on pirated or unlawfully accessed material, will have a more difficult time defending their conduct as fair use. The question is no longer whether opt-out language matters. It is whether companies can prove that their data intake systems actually recognize and respect it.

Continue reading with a member account

Register free to unlock full analysis and practical recommendations.

What the report does and does not do

It is important to describe the document accurately. This is a pre-publication report, not a new statute and not a binding regulation. Even so, it is more than a policy essay. It is the Copyright Office’s current analytical framework after extensive comments, hearings, and public debate. The report says that training a generative AI foundation model on a large and diverse dataset will often be transformative. That point matters. But the Office does not stop there. It also makes clear that fair use cannot be assessed in the abstract, divorced from source, access conditions, market substitution, and the availability of licensing.

The report also brings lawful access back to the center of the discussion. In the Office’s view, knowingly using a dataset made up of pirated or illegally accessed works should weigh against fair use, even if that factor is not automatically determinative by itself. That is a serious warning for developers who still talk about training risk as if it begins and ends with output similarity. The way material enters a dataset now matters more, not less.

Opt-out is not yet a statutory regime, but it is becoming a compliance interface

Some readers will be tempted to treat this report as a signal that the United States is about to adopt an EU-style statutory opt-out system for AI training. That overstates the report. The Office reviews the possibility of a statutory opt-out approach, but it also records substantial opposition from rightsholders who do not want U.S. copyright law to shift from a permission-based structure to a default-use model with a later opt-out. In the end, the Office does not recommend creating a statutory opt-out rule at this stage. Its broader recommendation is to let licensing markets continue to develop without immediate government intervention.

That said, opt-out mechanisms are no longer easy to dismiss as merely symbolic. The report discusses metadata, databases, technical flags, website terms, watermarking, and even industry discussion around enhanced robots.txt approaches. In practical terms, that means the market may reach a recognisable opt-out layer before legislation does. Once major platforms, publishers, and model developers start using more consistent signals, ignoring them will look less like operational oversight and more like a choice.

Why ignoring these signals makes fair use harder to defend

This is where the report becomes especially useful for litigation strategy. The Office notes comments arguing that, where copyright owners have opted out through terms of use, robots.txt instructions, or similar mechanisms, a developer’s decision to disregard those signals may inform the fair use analysis, particularly under the fourth factor dealing with market harm and licensing markets. That is not the same as saying every ignored signal automatically defeats fair use. It does, however, provide a clear pathway for plaintiffs to argue that a developer bypassed a visible reservation of rights.

Layer that with unlawful access, and the defence gets weaker. If a company cannot show where the data came from, whether paywalls or access controls were bypassed, whether exclusion signals were parsed, or whether high-risk sources were filtered out, the case stops looking like a dispute about innovation in the abstract. It starts looking like a dispute about conduct. Fair use may still be argued, but it becomes a much less comfortable argument when the factual record shows indifference to access rights and opt-out notices.

The next competitive advantage is licensing, provenance, and auditable controls

The Office does not hand the market a universal “do not scrape” rule, and it does not offer rightsholders a single standard protocol that solves everything. Its more realistic message is that voluntary licensing is already developing in some sectors, collective solutions may expand, and government should not rush in with a one-size-fits-all intervention. For businesses, that is not a signal to wait. It is a signal to operationalise.

Content platforms should think about where opt-out signals live, how they survive syndication and reposting, and what fallback markers remain if metadata is stripped. Model developers should be looking at crawl policies, terms-of-use parsing, source provenance, dataset cleaning, vendor warranties, and internal escalation processes for exclusion requests. The real dividing line will not be who says they respect copyright. It will be who can prove that their systems, contracts, and audit trail do.

通过 Email 接收最新资讯

The content in this section is provided for general reference only and does not constitute legal advice or formal service recommendations. For any specific matter, please consider the particular facts of your case and refer to the latest laws, policies, and practices of the relevant authorities.