UK Government Should Deal Definitively With Copyright Issues On LLM/GenAI Training Data Whilst Adopting A Positive Vision For LLMs To Ensure UK Does Not Miss

Large language models (LLMs) and generative AI (genAI) will produce "epoch defining changes comparable with the invention of the internet", stated the House of Lords Communications and Digital Committee as it issued its report "Large language models and generative AI" today (2 February 2024). The Committee concluded that the "goldrush" opportunity that AI presents requires the UK Government to adopt a more positive vision for LLM's in order "to reap the social and economic benefits, and enable the UK to compete globally". Key measures suggested include "more support for AI start-ups, boosting computing infrastructure, improving skills, and exploring options for an 'in-house' sovereign UK large language model" as well as devising a solution to the copyright disputes that the use of data without permission for the training of the AIs is currently generating.

The Committee sets out 10 core recommendations, as it says: "to steer the UK toward a positive outcome". These include measures to boost opportunities, address risks, support effective regulatory oversight – including to ensure open competition and avoid market dominance by established technology giants – achieve the aims set out in the AI White Paper, introduce new standards, and resolve copyright disputes.

The Committee calls on the Government to support copyright holders, saying the Government "cannot sit on its hands" while LLM developers exploit the works of rightsholders.The Committee Chair is quoted on the key role of copyright issues:

"One area of AI disruption that can and should be tackled promptly is the use of copyrighted material to train LLMs. LLMs rely on ingesting massive datasets to work properly but that does not mean they should be able to use any material they can find without permission or paying rightsholders for the privilege. This is an issue the Government can get a grip of quickly and it should do so."

The report "rebukes" tech firms for using data without permission or compensation, and says the Government should end the disputes over copyright and AI in this context "definitively" including through legislation if necessary. The report calls for a way for rightsholders to check training data for copyright breaches, investment in new datasets to encourage tech firms to pay for licensed content, and a requirement for tech firms to declare what their web crawlers are being used for.

Chapter 8 of the report deals specifically with copyright issues, in particular concluding:

In response to this report the Government should publish its view on whether copyright law provides sufficient protections to rightsholders, given recent advances in LLMs. If this identifies major uncertainty the Government should set out options for updating legislation to ensure copyright principles remain future proof and technologically neutral (paragraph 247).
The voluntary IPO-led process is welcome and valuable. But debate cannot continue indefinitely. If the process remains unresolved by Spring 2024 the Government must set out options and prepare to resolve the dispute definitively, including legislative changes if necessary (paragraph 249).
The IPO code must ensure creators are fully empowered to exercise their rights, whether on an opt-in or opt-out basis. Developers should make it clear whether their web crawlers are being used to acquire data for generative AI training or for other purposes. This would help rightsholders make informed decisions, and reduce risks of large firms exploiting adjacent market dominance (paragraph 252).
The Government should encourage good practice by working with licensing agencies and data repository owners to create expanded, high quality data sources at the scales needed for LLM training. The Government should also use its procurement market to encourage good practice (paragraph 256).
The IPO code should include a mechanism for rightsholders to check training data. This would provide assurance about the level of compliance with copyright law (paragraph 259)

See the House of Lords Communications and Digital Committee's announcement here.

The IPO working group began meeting on 5 June 2023 to look at identifying, developing and codifying good practice on the use of copyright, performance and database material in relation to AI, including data mining (previous plans for a legislated text and data mining exception to copyright infringement having been withdrawn in March 2023 – see our post here). However, progress towards a voluntary code appears to have been very difficult, with the code previously having been expected to be finalised in autumn 2023. The House of Lords Committee recommendation for the process to be taken back by Government if no code is forthcoming in the next few months is well timed, although the publication of this report may give further incentive to reach a conclusion.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

UK: UK Government Should Deal Definitively With Copyright Issues On LLM/GenAI Training Data Whilst Adopting A Positive Vision For LLMs To Ensure UK Does Not Miss "AI Goldrush" – Recommends House Of Lords Committee

Login to Mondaq.com

Why Register with Mondaq

Your Organisation