Understanding the Shift
In a significant move, major news organizations and social media platforms have opted out of Apple’s AI training data collection. This decision follows the introduction of Applebot-Extended, a tool allowing publishers to control their data usage. The tool enables website owners to prevent their information from being used in AI training without stopping the original Applebot from crawling their sites.
Key Details
- Prominent entities like The New York Times, Facebook, and Vox Media have opted out.
- Applebot-Extended respects publishers’ rights by allowing them to block data usage for AI training.
- Publishers can easily block the tool by updating their robots.txt files, which dictate bot access.
- Currently, only about 6-7% of high-traffic websites have blocked Applebot-Extended, indicating a lack of awareness or concern among many site owners.
The Broader Implications
This trend highlights a growing concern over intellectual property rights in the age of AI. As AI models become more integral to technology, the conflict between data usage for training and publisher rights intensifies. The ability to opt out of data collection signals a shift in how publishers view their content’s value and control in the digital landscape. It raises questions about the future of web crawling and the ethical use of data in AI development.











