The rise of generative AI companies like OpenAI and Anthropic has led to a disregard for established internet etiquette, with these companies scraping publisher sites without permission, ignoring the robots.txt standard that designates which parts of a site can be accessed by web crawlers. This lack of respect for online publishers’ control over their copyrighted content has sparked concerns about the future of online data collection and use. Moreover, it raises questions about whether ad tech firms will also start ignoring robots.txt, potentially leading to a free-for-all in web scraping. This trend is particularly concerning for online publishers who are already struggling to maintain control over their content in the digital age.

AI’s Web Scraping Free-for-All
Generative AI companies, including OpenAI, have argued to regulators that any publicly accessible content on the internet is open to fair use for training AI models.
1–2 minutes










