Towards a new licencing model for content used for model training#
The content wars, episode 3
Read also:
- The content wars, episode 1: BigAI to clean the web to feed from it
- The content wars, episode 2: From search to answers: How LLMs are rewiring the Internet's business model
- The content wars, episode 4: The coming war of synthetic works
- The content wars, episode 5: Original and synthetic content, and the Law
Anthropic has just been condemned for piracy and has agreed to pay 1.5 B$ for settlement.

(Image found here)
How short sighted this story is!
The content war#
We already covered in a previous entry the topic of SNCs, the acronym standing for Source of New Content. In order to have fresh data for training, the BigAI companies have to have structural access to new content.
This is already true for some of them:
- Facebook content can be used for Meta models,
- X content can be used for Grok training,
- Google content in free accounts can be used for Gemini training,
- I suppose Microsoft has access also to use content in free accounts.
For the other companies, meaning OpenAI, Anthropic, Perplexity or Mistral, finding structural SNCs is the condition of their survival.
Two points to address#
As we noted in this previous entry, SNCs will inevitably monetize their content because they can’t be out of BigAI durably.
This monetization model must have 2 characteristics:
- It must generate recurring revenue, at least one fee per model training going to the market;
- The fee must be proportional to the audience of the content being used for training.
A new licensing model#
The first point opens the way to a new content distribution licence for model training.
As a content provider, I want Anthropic to pay me a fee per training of a public model. I will have them sign a license for content usage in a commercial LLM.
This license can have several variations:
- It can be granted just for one model version,
- It can be granted for one year whatever the models,
- It can be granted with a reference to the source or not,
- Etc.
In a certain way, those licenses may be close to the software distribution ones.
Cost based on audience#
This is a tricky one. Based, for instance, on a web search engine audience, and/or on the number of copies sold of a book, the license price can be evaluated.
Look at Spotify model: based on audience, creators are remunerated. The very known creators get the majority of the revenues and the others get a smaller part.
BigAI as a new distribution channel#
For sure, BigAI companies must not hack the content of authors. But authors have absolutely no interest to make BigAI companies their enemies.
Instead, this must go to lawyers and define this new distribution license. They must negotiate with BigAI companies the acceptable fees looking at models like Spotify.
BigAI companies must include in their paying models the fees of the licenses, and find a way to address free accounts:
One possibility is that free account only have access to models trained with free content.
Another possibility is that free account access to models that will include ads paying the fees for contents with authors rights.
Class action settlement should open a new era#
OK to condemn BigAI companies for piracy. But the interest of every actor is that a real business model is put in place, rather than antagonizing players.
Let’s get out of the content war and find a win-win-win situation for BigAI users, BigAI companies and authors.
(September 6 2025)
Navigation:
- Next: The content wars, episode 4: The coming war of synthetic works
- Index
- Previous: Should AI have a personality?