© 2024 Blaze Media LLC. All rights reserved.
OpenAI transcribed millions of hours of YouTube content to train its chatbots, potentially violating copyright laws
Photo by Joan Cros/NurPhoto via Getty Images

OpenAI transcribed millions of hours of YouTube content to train its chatbots, potentially violating copyright laws

To make OpenAI smarter, researchers created a speech-recognition tool called Whisper back in 2021 to train its Large Language Models. The tool was specifically designed to transcribe millions of hours of audio from YouTube videos, which gave the company a massive edge over its competitors, according to a recent report from the New York Times.

OpenAI allegedly knew that carrying out the huge transcription project fell within a legally gray area, but they decided to continue with it anyway. The report mentioned that OpenAI president Greg Brockman was even personally involved in gathering the videos that were to be transcribed.

While OpenAI was moving forward with its transcription project, Google was reportedly doing something similar for its AI models — all of which were potentially breaching copyright laws.

The issue became so significant that YouTube CEO Neal Mohan chimed in on the issue, conceding that he had no firsthand knowledge that OpenAI had been using YouTube's protected material. But if it were the case, OpenAI would be in "clear violation" of YouTube's terms of service, according to Bloomberg.

“From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations,” Mohan said during an interview earlier this month.

“One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.”

Tech Spot reported that Google spokesperson Matt Bryant appealed to the company's terms of service, noting that Google had taken "technical and legal measures" to ensure that no unauthorized practices were undertaken unless there was a "clear legal or technical basis to do so."

Google mentioned that its AI tools are trained on "some YouTube content" that is currently allowed after reaching agreements with creators on the platform.

However, the Times reported that Google has since expanded its terms of service since its initial remarks on the topic, giving the company more rights over consumer data. Some of the information that Google has allowed itself to use includes Google Docs and restaurant reviews that are posted to Google Maps.

Without clear legislation concerning AI tools and what they can legally use, it is currently uncertain how the issue of copyright will be resolved.

Like Blaze News? Bypass the censors, sign up for our newsletters, and get stories like this direct to your inbox. Sign up here!

Want to leave a tip?

We answer to you. Help keep our content free of advertisers and big tech censorship by leaving a tip today.
Want to join the conversation?
Already a subscriber?