YouTube’s opt-in AI training is turning creators into silent architects of future tech tools
Many creators say yes to AI training access, even when there’s no money involved
Oxylabs gathered millions of videos into a dataset that AI developers can ethically trust

An increasing number of YouTubers are allowing AI companies to train models using their videos, and surprisingly, many are doing so without direct compensation.

Under YouTube’s current setup, creators are given the option to opt in by ticking boxes that grant permission to around 18 major AI developers.

If no box is selected, YouTube does not permit the use of that video for AI training purposes. This means the default stance is non-participation, and any inclusion is fully voluntary.

Creators choose influence over income

The lack of payment may seem unusual, and the motivation appears to hinge on influence rather than income.

Creators opting in might see it as a strategic move to shape how generative AI tools interpret and present information – by contributing their content, they are effectively making it more visible in AI-generated responses.

As a result, their work could shape how questions are answered by everything from AI writers to large language models (LLM) for coding.

Oxylabs has now launched the first consent-based YouTube dataset, comprising four million videos from one million distinct channels.

All contributors explicitly agreed to the use of their content for AI training, and according to Oxylabs, these videos, complete with transcripts and metadata, have been carefully curated to be particularly useful for training AI in image and video generation tasks.

“In the ecosystem aiming to find a fair balance between respecting copyright and facilitating innovation, YouTube streamlining consent giving for AI training and providing creators with flexibility is an important step forward,” said Julius Černiauskas, CEO of Oxylabs.

This model not only simplifies the process for AI developers seeking ethically sourced data but also reassures creators about the use of their work.

“Many channel owners have already opted in for their videos to be used in developing the next generation of AI tools. This enables us to create and provide high-quality, structured video datasets. Meanwhile, AI developers have no trouble verifying the data’s legitimate origin.”

However, broader concerns persist about how government organizations and legislatures handle similar issues.

For instance, the UK’s Data (Use and Access) Bill has stalled in Parliament, prompting figures like Elton John to criticize the government’s handling of creator rights.

In this legislative vacuum, creators and developers will likely face uncertainty.

Oxylabs presents itself as filling that gap with a consent-based model, but critics will still question whether such initiatives genuinely address deeper issues of value and fairness.