In order to generate appealing output, AI models are to be fed with data. This offers owners of data sets new ways to commercialize data, as they can license data to third parties. In part 12 of our AI blog series, we highlight what you should bear in mind when setting up data license agreements.

1448332a.jpg

A. Licensing data as a business model?

Feeding AI models with data of the right quality and quantity is of central importance for the success of AI applications. The more accurate and comprehensive this training data is, the better the output. This opens up a new source of income for owners of data sets. For example, the operator of the Reddit platform licenses the content generated by users to Google for around USD 60 million a year.

However, it is not only the tech giants that can benefit from this relatively new market. These models can also be interesting for SMEs. Let us assume that a heating engineer has installed and serviced heat pumps of a certain brand over the last few decades. His customer file shows when he has carried out which repair services for which customer. This information can be used to draw conclusions about wear and tear, which could be of interest to the manufacturer. He could, for example, develop a tool for preventive maintenance and make it available to his sales partners. For the heating engineer himself, the implementation of such a tool at his own expense would not be worthwhile. The transfer of this data from the heating engineer to the manufacturer therefore makes economic sense for both parties. But how is this supposed to work?

B. Special features of data license agreements

Swiss law does not recognise a general exclusive right to data; it only allows a claim for protection in certain areas. For example, the unauthorised disclosure of trade or business secrets is prohibited. If the data is personal data, i.e. data relating to a identified or identifiable person, the data subject can prohibit the processing of the data and, in particular, the disclosure of the data under certain circumstances. If the conditions for protection under copyright law are met (i.e. if they are intellectual creations with an individual character), there is an exclusive right to them, although this only applies to the work itself or parts thereof, but not to each individual item of data contained therein. If the data or data set is a marketable work result, it may not be adopted and utilized by means of technical reproduction processes without reasonable effort on the part of the user. Ownership under property law can only exist in the carrier medium, but not in the data itself stored on it.

The use of data by third parties can therefore often not be prevented by means of legal provisions. Whoever wishes to use data is therefore not obliged to obtain permission from anyone. At first glance, the commercialization of data therefore does not seem so easy, as there is no incentive on the part of the customer to pay for the use of data - especially if the data is publicly available.

If you want a third party to pay for the use of data, you can achieve this, for example, by means of de facto control instruments, namely by protecting your data from unauthorised access and only granting access for a fee.

If the owner of the data makes it available to a third party, it is advisable to contractually stipulate what this third party may do with the data. In particular, it should be ensured that the third party is not authorised to pass on the data without restriction. Otherwise, the first data owner loses control over "its" data and therefore also over its commercialization.

Furthermore, the requirements of the Data Act must also be considered for machine-generated data in the EU. We will address this in a separate blog post.

In this blog post, we show what data owners need to bear in mind if they want to successfully commercialize their data.

C. What you should pay attention to before passing on data

The data made available for licensing may contain various information for which contractual or regulatory restrictions must be observed. Particular attention must be paid to the following:

Personal data: If your data set contains personal data, e.g. the name of the invoice recipient, the requirements of data protection law must be considered. As personal data is usually not relevant for the training of AI models and the use of personal data is associated with additional regulatory requirements, in many cases it will be advisable to anonymise the data before transmission. Depending on the data set, simply removing the name, telephone number or address will not be sufficient to effectively anonymise the data - because the data is only anonymised when it cannot be assigned to a person again, or not with a reasonable amount of effort. If anonymisation is not possible or the personal data is relevant for further use, this must be communicated transparently to the data subjects, e.g. in a privacy statement. Depending on the setup, the consent of the data subjects is required - for example, if the disclosure is not in line with the general principles of data processing and no overriding interests of the controller justify the disclosure or in the case of the disclosure of particularly sensitive personal data (e.g. health data). Unlike under Swiss data protection law, the GDPR requires a legal basis for data processing, which will regularly complicate the transfer of personal data. We will discuss the requirements that must be met for the training of AI models in a separate blog post.

Contractual protection of confidentiality: Confidentiality obligations are often incidentally included in contracts without the parties being aware of the possible consequences. For example, information about the sales volume of a contract or the specific services utilised may well be covered by a contractual confidentiality obligation. Anyone who betrays a manufacturing or trade secret that they are obliged to protect due to a legal or contractual obligation, as well as anyone who exploits such a disclosure for their own or another's benefit, may be punished (Art. 162 StGB). The disclosure of manufacturing or trade secrets is also prohibited if they have been obtained unlawfully (Art. 6 UCA).Therefore, when data from business operations is used for secondary purposes, care must be taken to ensure that it is selected or prepared in such a way that no contractual confidentiality obligations are breached. Ideally, the confidentiality obligation should already specify for which secondary purposes contractual data may be used.

Legal protection of confidential information:There are various statutory duties of confidentiality, such as professional secrecy (Art. 321 SCC), to which lawyers or doctors, for example, are subject or the so-called "small professional secrecy", which applies to any person who has obtained knowledge of confidential personal data in the exercise of his or her profession and intentionally discloses it to third parties (Art. 62 DPA).

Copyright: If the data set contains photographs or other copyrighted content, the consent of the copyright holder is generally required for the transfer (see our blog post "Copyright and AI: Responsibility of providers and users"). In the case of internally generated content, the employer is generally authorised to use it and pass it on accordingly, as the employer generally owns the rights to the work product. However, if the data set contains content from third parties, it must be evaluated on a case-by-case basis whether the transfer is permitted. This will usually require the creator's consent.

Antitrust law: If the data is shared between competitors in the market and this data facilitates collusion in connection with prices, quantities or territories, particular caution is required.

D. You should regulate these points contractually

Content of the data set - What kind of data does the data set contain?: It should be clear to the recipient what kind of data they are receiving and what characteristics this data has. This includes, for example, the quality (e.g., is it required that every single data pair within the data set is correct and complete?), but also the actual content of the data set. This sounds banal and obvious, but in practice, a relatively accurate transcription of the data is of great relevance. For example, heart rate data is not necessarily just heart rate data, but it can be decisive whether this data was collected using professional and calibrated measuring devices by specialists or using private fitness trackers. Further, to prevent any problems, we recommend specifying what should not be included in the data record, such as personal data or trade secrets.

It should also be noted whether the data set is transferred once or whether regular updates are included in the agreement. It is also important to define a cut-off date on which the data set is extracted and to record how up-to-date the data is at this point in time (e.g. over what period or up to what end date the data was collected). For larger data sets in particular, it is advisable to use hash values to label the data sets.

Preparation, format and structure of the data/data set:Even the highest quality data is of no use to the recipient if they cannot read and process it. It is also not uncommon for a recipient to obtain data sets from different providers, which they then want to combine for their own purposes. This is only possible if the data sets, including the data they contain, are compatible with each other or at least can be made compatible. It is therefore advisable to contractually define the format and structure of the data/data set.

Personal data: Can the recipient assume that the data set does not contain any personal data or is the data only pseudonymised so that it is relatively easy to assign it to a natural person? From the perspective of the data owner, it makes sense to oblige the recipient to take measures to prevent re-identification and, in particular, to refrain from such actions themselves.

Delivery of the data: For example, does the recipient have only temporary remote access with a clearly limited scope of use? If so, must a certain availability of access be ensured and may the recipient make copies of the data set? Or will the data set be delivered to the recipient in one go and can they then save it on their own computer?

Use of the data by the recipient: The data provider should consciously think about the purposes for which the data could potentially be used and which purposes it does not want to permit either economically or ethically. In order to prevent the owner of the data from losing power over "its" data or its commercialization, it should be regulated whether and, if so, under what conditions and to what extent the data set may also be made available to third parties by the recipient. As the use of the data by third parties can hardly be prevented by the owner of the data once it has been passed on by the recipient (unless there are additional rights to it, such as copyrights), it may make sense to provide for a contractual penalty in the event of unauthorised disclosure of the data.

Payment modalities: There are many conceivable ways in which access to data can be remunerated. For a one-off delivery of data, a lump sum payment may be appropriate; for recurring deliveries/updates, a subscription model is usually more suitable. Of course, profit or revenue shares in the model trained with the data or the waiving of future usage fees for the AI application are also possible.

Obligations when handling the data set: The technical and organisational measures to be implemented by both parties to ensure the integrity, availability and confidentiality of the data set should be defined - as is often done in connection with the processing of personal data.

Warranty and liability:The recipient will generally have an interest in ensuring that the data provided by the supplier is correct. It might also be of importance for the recipient that the disclosure and agreed use of the data does not infringe any third-party rights. This includes, in particular, the aforementioned rights that may exist in data or data sets, such as copyright. The recipient may also be asked to agree to indemnify in the event of such an infringement. Whether and to what extent the data owner can and should agree to this depends on the respective setting. In the case of very large data sets, such agreements will probably lead to an excessive liability risk for the data owner in many cases. It is therefore important to align the scope of the warranty, liability and remuneration.

Termination of contract: What happens if the contract is terminated? Are there specific cancellation obligations? As there is no generally applicable exclusive right to data, the obligation to discontinue use must be based purely on contract. It should be noted that it is often difficult to prove that a license holder is continuing to use a data set without authorisation even after the contract has ended: it is often impossible for third parties to identify the data a company is working with, and it is often difficult or impossible to determine the origin of the data.

E. Alternative license models?

Setting up direct content licensing, as in our example of the heating engineer, is time-consuming in individual cases and this effort will often not be worthwhile. It is therefore to be expected that certain standards will become established for the licensing of data in the medium term. This standardisation can take place in various ways - certain models have already been tried and tested in the music industry.

Firstly, aggregators can collect content from the market and, if necessary, process it and then license it further. Instead of launching an AI image generator itself, Getty Images could make its image database available to providers of AI models for training purposes. In the music industry, music publishers (e.g. Sony Music Publishing) take on such a role: they sign songwriters and assume the responsibility for licensing to collective rights management organisations, record labels and media companies.

It would also theoretically be possible for collective rights management organisations to take over exploitation on behalf of rights holders in specific cases. With this model, the rights holders cannot prevent the use, but are remunerated for it. In the music industry, this model is used, for example, when a song is played on the radio. A composer cannot prevent a radio station from playing his songs - but the radio station must pay the collective rights management organisation a fee, which is distributed to the composer via a distribution formula. However, a direct adaptation of this model to the licensing of data sets is rather unlikely, especially as this model is linked to the copyright of the owners. Some of the data sets not publicly available in the companies that are interesting for the training of AI models will not give rise to a copyright protection claim - thus often only factual or contractual restrictions can limit the use by third parties. There is therefore - with a few exceptions aimed at specially defined situations (see above under "Special features of data license agreements") - no "right" that can be asserted against any third party who does not have a license.

Model contracts, which are frequently used in the context of open source software in particular, are very promising. They allow the rights holder to publish the content under a known license and the user to use the content free of charge without having to obtain permission from the rights holder. There are already open source licenses that are specifically tailored to the licensing of AI models, e.g. those from RAIL (Responsible AI Licenses). RAIL has indicated that it will publish a model license specifically tailored to the licensing of data (OpenRAIL-D). It remains to be seen whether the standards for paid licensing will also become established. In order to realise the full potential of data licensing, this would certainly be welcome.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.