Google and OpenAI want to feed your content into their learning models
Google’s recent submission to the Australian government’s review of AI regulatory framework includes the company’s assertion that generative AI programs should be allowed to scrape the entire internet for content. This would greatly upend current copyright laws.
The company has called for Australian policymakers to promote “copyright systems that enable appropriate and fair use of copyrighted content to enable the training of AI models in Australia on a broad and diverse range of data, while supporting workable opt-outs for entities that prefer their data not to be trained in using AI systems”.
The call for a fair use exception for AI systems is a view the company has expressed to the Australian government in the past, but the notion of an opt-out option for publishers is a new argument from Google.
When asked how such a system would work, a spokesperson pointed to a recent blog post by Google where the company said it wanted a discussion around creating a community-developed web standard similar to the robots.txt system that allows publishers to opt out of parts of their sites being crawled by search engines.
Meanwhile, ChatGPT creator OpenAI has released a new web crawler that can scrape your data—but blocking it seems easy enough for most web teams to handle.
Both examples make the case for comms being in close collaboration with digital teams to understand how they can best keep their proprietary content from being scraped. As The Guardian piece suggests, a paywall may do the trick, though this solution won’t work for every business.
Opt-out protocols are often made intentionally dense or tricky (just ask your marketing colleagues), so best to get this conversation rolling now. You can ensure this happens by forming a cross-functional AI task force at your organization. In our humble opinion, it can and should be led by comms.