Expert Speak Digital Frontiers
Published on Apr 21, 2026

As AI systems increasingly rely on copyrighted data, global legal frameworks are diverging in how they balance innovation with the rights of creators. A nuanced, hybrid approach that combines legal clarity, fair compensation, and scalable licensing mechanisms may offer the most viable path forward for India.

Training AI on Copyrighted Works: Comparative Legal Frameworks

Artificial intelligence (AI), particularly large language models (LLMs), has rapidly transformed the creation, processing, and dissemination of information across industries. These systems are trained on vast datasets that often include copyrighted works, raising complex legal and ethical questions. Central to this debate is whether the use of such material for LLM training constitutes copyright infringement or can be justified under existing legal exceptions. At the heart of the issue lie two competing priorities: enabling technological advancement and safeguarding the rights and economic interests of copyright holders.

As AI systems become more commercially significant, concerns regarding consent, compensation, and accountability have grown. Jurisdictions around the world have begun to address these concerns in different ways, reflecting variations in national laws, policy priorities, and interpretations of the issue. This evolving global landscape is best understood through a comparative examination of how key jurisdictions approach the use of copyrighted works in LLM training.

The US Approach: Fair Use Flexibility and Inconsistencies in AI Training

The United States regulates the use of copyrighted works in AI LLM training primarily through the flexible “fair use” doctrine under 17 U.S. Code section 107. This provision allows courts to evaluate claims on a case-by-case basis using a four-factor test that considers the purpose of the use, the nature of the work, the amount used, and the impact on the market. A key concept within this framework is “transformative use” of a copyrighted work, under which courts assess whether AI model training creates new functionalities or meanings, rather than merely substituting the original work. However, the absence of a specific text and data mining (TDM) exception creates legal ambiguity. Unlike jurisdictions such as the EU, US courts rely on case-by-case judicial interpretation, leading to inconsistent outcomes across different courts.

Overall, the US framework remains broadly supportive of innovation but continues to exhibit legal inconsistencies, as courts increasingly treat the fourth factor — market impact — as a critical consideration in adjudicating AI-related copyright disputes.

Policy guidance from the US Copyright Office, however, stipulates that using copyrighted data obtained through lawful access could fall within fair use, particularly where the purpose is research or innovation. At the same time, the Office has expressed a preference for voluntary licensing markets and collective licensing arrangements as mechanisms to balance innovation with creators’ interests, rather than expanding government intervention.

Recent case law illustrates this inconsistency. In Bartz v. Anthropic, the court distinguished between lawful and unlawful sources, holding that training on legally acquired works was transformative and permissible, while the use of pirated books weighed against fair use due to potential market harm. In contrast, Kadrey v. Meta Platforms Inc. upheld fair use even where pirated works were involved, emphasising the transformative nature of AI training.

Overall, the US framework remains broadly supportive of innovation but continues to exhibit legal inconsistencies, as courts increasingly treat the fourth factor — market impact — as a critical consideration in adjudicating AI-related copyright disputes.

The EU Approach: Structured TDM Exceptions and Proactive AI Governance

The European Union has adopted a structured, legislation-based approach to AI training on copyrighted works through Directive (EU) 2019/790 on Copyright in the Digital Single Market (DSM Directive) and the 2024 Artificial Intelligence Act. Unlike the flexible US model, the EU framework provides explicit statutory exceptions for text and data mining (TDM). Article 3 permits TDM for scientific research, while Article 4 allows broader commercial use, provided the content is lawfully accessed and rights holders have not opted out through machine-readable means. This creates a dual-tier system that seeks to balance innovation with copyright protection.

Unlike the flexible US model, the EU framework provides explicit statutory exceptions for text and data mining (TDM). Article 3 permits TDM for scientific research, while Article 4 allows broader commercial use, provided the content is lawfully accessed and rights holders have not opted out through machine-readable means. This creates a dual-tier system that seeks to balance innovation with copyright protection.

The AI Act further bolsters this framework by embedding copyright compliance directly into AI governance. Recital 105 acknowledges that AI training may involve copyrighted works and requires authorisation unless covered by exceptions. Article 53 obliges providers of general-purpose AI models to implement compliance policies, respect opt-out mechanisms, and disclose summaries of training datasets. EU policy complements these requirements by emphasising opt-out mechanisms (such as machine-readable signals like robots.txt), licensing mechanisms, and stakeholder collaboration. The General-Purpose AI Code of Practice also encourages safeguards to reduce the risk of infringing outputs. These measures show a proactive model of AI governance that seeks to balance innovation and copyright protection through structured regulatory oversight.

China’s Approach: Strong Regulatory Control with Adaptive “Reasonable Use” Interpretation

China adopts a state-centric and compliance-driven approach to AI training and copyright, prioritising legal and regulatory oversight over explicit exceptions. The 2023 Interim Measures for the Management of Generative Artificial Intelligence Services mandate that AI service providers use data from lawful sources and strictly avoid intellectual property infringement. Similarly, Article 24 of the Copyright Law of the People’s Republic of China provides an exhaustive list of permitted uses, such as research and education, but does not provide a specific text and data mining (TDM) exception for AI training.

However, Chinese judicial “opinions” introduce some flexibility. The “Opinion” (Article 8)[1] of the Supreme People’s Court on “Issues concerning Maximising the Role of Intellectual Property Right Trials” allows for a “reasonable use of work” interpretation, assessing factors like purpose, nature, quantity used, and market impact. This flexible understanding of “reasonable use,” combined with a strict requirement of lawful sourcing, creates a controlled yet adaptive framework that seeks to balance the rights of copyright holders with the innovative use of protected works. Unlike other states, China situates AI governance within broader state objectives, including economic development and technological leadership.

India’s Approach: Emerging Legal Uncertainty and the Shift Toward Collective Licensing Solutions

India’s legal framework on AI training and copyright is still evolving. The Copyright Act does not currently provide a specific exception for LLM training or for text and data mining (TDM). Instead, Section 52‘s “fair dealing” exceptions are limited to research, criticism, reporting, and review. These exceptions are purpose-specific, suggesting that large-scale AI training — particularly when undertaken for commercial objectives — may not readily qualify. As a result, significant legal uncertainty persists for AI developers, as the applicability of fair dealing to LLM training remains unclear. Notably, in ANI v. OpenAI, the Delhi High Court has reserved judgment on whether training ChatGPT on copyrighted news content violates Indian copyright law, marking the country’s first major judicial test in this domain.

Policy must clarify the scope of “fair dealing” under the Copyright Act with regard to the training of LLMs on copyrighted works, explicitly addressing whether such training falls within any of the existing exceptions. This would reduce legal uncertainty and help prevent further litigation on the issue in the future.

On the policy front, India is exploring innovative and scalable solutions to address this issue. The government’s recent Working Paper on Generative AI and Copyright (Part 1) - One Nation One License One Payment - Balancing AI Innovation and Copyright suggests a blanket licensing framework. Under this model, AI developers would be permitted to train models on lawfully accessed content without negotiating individual licences, while royalties would be triggered upon commercialisation. Royalty rates would be determined by a government-appointed authority and subject to judicial oversight, with a centralised collection mechanism ensuring efficient distribution to rights holders.

Different jurisdictions are adopting divergent approaches to address AI-LLM training on copyrighted works, ranging from flexible judicial interpretation to structured statutory and regulatory frameworks. Drawing on these global models, and taking into account India’s distinct legal and policy context, a balanced and forward-looking course of action is urgently required.

Toward a Balanced and Scalable AI Copyright Framework for India

India could adopt its proposed policy as a hybrid and future-ready framework that balances innovation with copyright protection. First, policy must clarify the scope of “fair dealing” under the Copyright Act with regard to the training of LLMs on copyrighted works, explicitly addressing whether such training falls within any of the existing exceptions. This would reduce legal uncertainty and help prevent further litigation on the issue in the future.

India could operationalise its proposed “One Nation, One License, One Payment” model by introducing a collective licensing regime. This would allow AI developers to lawfully access large datasets, with royalties compensating copyright holders in due course.

Second, India could operationalise its proposed “One Nation, One License, One Payment” model by introducing a collective licensing regime. This would allow AI developers to lawfully access large datasets, with royalties compensating copyright holders in due course. Strong safeguards could accompany such a licensing system, including fair pricing, equitable revenue distribution, and special provisions for startups and small developers to prevent market concentration. Transparency obligations, such as dataset disclosures and audit mechanisms, could also be incorporated.

The global landscape reveals no single uniform solution, but rather a spectrum of approaches balancing AI innovation and copyright protection. India, by drawing from global best practices while remaining attentive to its own legal and policy context, may develop a nuanced framework that safeguards creators’ rights while fostering responsible AI advancement suited to its particular circumstances.


Debajyoti Chakravarty is a Research Assistant with the Centre for Digital Societies at the Observer Research Foundation.

The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.

Author

Debajyoti Chakravarty

Debajyoti Chakravarty

Debajyoti Chakravarty is a Research Assistant at ORF’s Center for New Economic Diplomacy (CNED) and is based at ORF Kolkata. His work focuses on the use ...

Read More +