When The Internet Is Not Enough

There have been a lot of stories recently about how the large language models behind the biggest artificial intelligence products from OpenAI, Gemini and Meta have consumed so much data in the training of their models that they have (or will imminently) exhaust all of the data available on the Internet.

In case you missed that, I’ll repeat it. They have consumed virtually ALL of the data available on the Internet. All of it. That’s a lot of data.

I’ve previously written about the copyright issues at play with this behavior, and some of these recent stories from credible sources suggest that OpenAI, Google and Meta knew they were in potentially questionable territory in using copyrighted works to train their models, but did so anyway.  They were so eager/desperate to find new, untapped pools of data to use for AI training that OpenAI and Google figured out how to transcribe the audio portions of more than one million hours of YouTube videos, likely in violation of YouTube’s own terms of use.  And let’s not forget that YouTube is owned by Google!  Meta even considered purchasing Simon & Schuster, the book publisher, to mine its catalog of books.  While it does not excuse their violation of copyright holders’ rights, hearing about the need and competition for such vast amounts of data gives some insight as to why they proceeded without heeding the warning signs. 

Robert Rosenberg

Robert Rosenberg is an independent legal consultant and principal of Telluride Legal Strategies.  He spent 22 years at Showtime Networks in various legal and business roles, most recently as Executive Vice President, General Counsel and Assistant Secretary.  He now consults with companies of all sizes on legal and business strategies. Rob is a thought leader, an expert witness, and a problem solver working at the intersection of media, communication and technology with a strong interest in solving issues introduced by artificial intelligence in business.  Rob can be reached at rob@telluridelegalstrategies.com.

https://www.telluridelegalstrategies.com
Previous
Previous

Xcyte Digital Defines Its Trajectory with Strategic Acquisition

Next
Next

Crisis Communication With The Ongoing Campus Protests