Original and synthetic content, and the Law#

The content wars, episode 5

Elon Musk did dt: The Grokipedia experiment#

The latest front in the content war is the battle for encyclopedic truth. Elon Musk's xAI recently launched Grokipedia, an AI-generated encyclopedia powered by the Grok chatbot, explicitly positioned as a more "truthful and independent alternative" to Wikipedia.

Not so long ago, he announced it (see episode 1). It's now done.

This move marks a significant, large-scale application of synthetic content generation. At its launch in late October 2025, Grokipedia already boasted more than 885,000 articles. While this is still much less than the 7 million+ English articles of Wikipedia, the sheer volume of content generated synthetically and positioned as an authoritative knowledge base is a first of its kind.

The inherent conflict is palpable: Grokipedia aims to purge what Musk perceives as "editorial bias," but its content is often found to be derived or adapted from Wikipedia!

But Grokipedia is not just a rival of Wikipedia: It seems like an experiment in automating the very process of historical and factual record-keeping. We talked, in older episodes about the Sources of New Content (SNCs) and in particular about the news as being one of those sources. Grokipedia intends to capture those sources and create a real timeline of events, a real automated history.

In a way, Grokipedia begins the AI witness of the human history, and the AI scribe of human history, ensuring it is up-to-date and ensuring facts were not:

"forgotten",
or twisted by partial editors.

In that sense, Musk is not creating a new encyclopedia but a new encyclopedist and archivist! That could be a very deep change in the way we are doing short term history. That could be close to a new version of Foundation, from Asimov, except that the Foundation is, in a way, Grok itself.

Just think about all those confidential letters working on open sources. They analyze what influential people say in the various media, gather it and create perspective. They are doing the work of archivists and cross-reference all what is said publicly to draw unexpected portraits of those influential people. If Grokipedia is doing it well, people may discover automatically the true face of many individuals. Systematic archiving and correlation is fundamental to understanding not-so-hidden motivations. That looks like an objective Musk could have.

Sora 2 and the copyright tightrope#

Talking about content generation, OpenAI’s new text-to-video model, Sora2, is pushing the technical limits of content generation

Many people fought against it and asked questions about copyright

However, the situation is really crytal clear: The law of copyright (Title 17) applies.

We can say that it even applies twice:

User Responsibility: The foundational principle of copyright still applies to the user of the AI.
- If you use Sora to generate an image or video of a copyrighted character, like Mickey Mouse, for your private, personal use, it might be seen as legally acceptable, much like drawing it on a notepad.
- However, the moment you attempt to publish, sell, or publicly use that image without a proper license or agreement, you are infringing on the copyright owner's exclusive rights.
- The AI is a tool; the user is responsible for the output's usage.
AI Training Data: Here also the situation is clear: using copyrighted content to train an AI is a form of content usage.
- Creators (authors, artists, news agencies) must use a license to distribute their content to the AI companies, and those companies have to pay for the content use in AI training.
- Some courts, particularly in the US, have considered using copyrighted data for AI training as "fair use" if the use is transformative and doesn't recreate the original. That is, seen from us, a bad application of the law. Even AI companies seem to think this way because, more and more, they are choosing to mitigate risk by signing licensing deals with publishers and content creators.

On that topic, in a matter of months, everyone will forget the buzz. The solution was obvious from day 1: Go back to the law and just apply it.

Filming everything to feed the hungry AI#

As AI models like Sora become more sophisticated, their hunger for high-quality, real-world data is limitless. This leads to an intriguing, almost dystopian thought experiment: If we don't have enough cinematic-quality footage to train the next generation of autonomous cars, world simulators, or immersive AR systems, should we simply film everything?

The parallel is Google Maps, which systematically captured the world's streets to build a geographic model. Extending this logic to video would mean permanently mounting billions of cameras—on every vehicle, street corner, and wearable device—to create a continuous, multi-perspective record of human life and physics in action.

This strategy could remind of Borges' short story North North-West where a writer wants to capture the full essence of the reality of the world. He intends to describe with words the entire world in all its details. Realizing the immensity of the task, he decides to start by describing his country, then his region, then his town, then his house, then his office, then his desk and finally the north north-west corner of his desk (hence the title of the short story).

In our case, to what level of details should we stop? Should we also film every willing human? At every stage of his life?

There is a chance that, for video training, AI companies would come to the extremity of filming everything. In a way, companies such as Google already have an advantage because they already have the huge infrastructure to update Goggle Maps.

That will open new question about personal data: Could you claim having right on the images of your car you parked on the street? On your cat that was captured yesterday in video? On a video of your house showing your new pool (undeclared to the IRS)?

Personalize your AI and keep your paper books!#

The final, and perhaps most philosophical, stage of the content war concerns the human mind itself. If AI content—synthetic, optimized, and nearly limitless—floods our information ecosystem, will we, the consumers, lose our critical faculties?

Will we be able to distinguish between human generated content and synthetic content anymore? Will it be the end of reflection, the standardization of opinions? Will we move towards the insipidity of business contents when everyone will use the same recommendations, coming from the same AIs, trained on the same data? Will we even remember what it is to have a personality?

That's why personalizing AIs is probably something to look for (see Should AI have a personality?, because your AI will not be like the one of your neighbor. That would be entering into a paradox: users to customize their AI personality to replace their lack of personality?

And that's why hallucinations may be, in the future, an important advantage to distinguish AIs (see BigAI is the Book of Sand). However criticized, parameters like temperature enable the AIs to be more creative, or at least to do unexpected connections between ideas.

In this context, old books, original manuscripts, and non-digitized human-created content are of immense value, even if few people realize it. They are the verified truth-anchor, artifacts we are confident contain the unique thought, original bias, and verifiable human hand of an author. They really come from humans, from singular individuals that worked hard to express something, with their own world vision and their own personality.

In the digital world of tomorrow, those human voices may soon appear as endangered animals.

(November 11 2025)

Navigation:

Next: Education and personality
Index
Previous: Radicalization of AI positions