Google Let OpenAI Scrape YouTube Data Because Google Was Doing It Too

April 7, 2024

1 View 0

SaveSavedRemoved 0

Google Let OpenAI Scrape YouTube Data Because Google Was Doing It Too

[ad_1]

OpenAI made headlines recently after its CTO couldn’t say definitively whether the company had trained its Sora video generator on YouTube data, but it looks like most of the tech giants—OpenAI, Google, and Meta—have dabbled in potentially unauthorized data scraping, or at least seriously considered it.As the New York Times reports, OpenAI transcribed than a million hours of YouTube videos using its Whisper technology in order to train its GPT-4 AI model. But Google, which owns YouTube, did the same, potentially violating its creators’ copyrights, so it didn’t go after OpenAI. In an interview with Bloomberg this week, YouTube CEO Neal Mohan said the company’s terms of service “does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service.” But when pressed on whether YouTube data was scraped by OpenAI, Mohan was evasive. “I have seen reports that it may or may not have been used. I have no information myself,” he said.The Times’ report focuses on the need for more and more data to train advanced AI models, and the sometimes sketchy things the tech giants have considered to get it. As OpenAI CEO Sam Altman has noted, data “will run out” eventually, putting the usefulness of these billion-dollar companies’ products in question.Meta, for example, discussed acquiring Simon & Schuster so its AI could ingest the publishers’ books. It also pondered just scraping whatever it needed and hoping people didn’t sue, the Times says, citing recordings of internal meetings. Execs also looked to a 2015 ruling that said Google did not violate copyright laws by digitizing books for Google Books.

Recommended by Our Editors

Google, meanwhile, changed its terms of service (on a holiday weekend) to let it use public Google Docs, restaurant reviews on Google Maps, and other internet data to train its AI. The Docs data was used as part of “an experimental program,” Google tells the Times.The Times itself has already pushed back on this data scraping. It sued OpenAI and its partner Microsoft for using Times content to train its AI models, a case that’s currently making its way through the courts.

Get Our Best Stories!
Sign up for What’s New Now to get our top stories delivered to your inbox every morning.

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.

[ad_2]