The rise of generative AI has led to increased web scraping by major tech companies and AI startups, often without consent from content creators. These companies rely on scraping original content from the web to train their AI models, a practice that has raised significant ethical and legal concerns. Many websites have not given permission for their content to be used in this way. A report from Akamai highlighted that bots now make up a significant portion of web traffic and that AI is facilitating cybercriminal activities.
Cloudflare has introduced a new solution to help website owners combat unauthorized scraping. This one-click tool is available to both free and paying customers and aims to block AI bots that ignore robots.txt directives. It uses advanced fingerprinting techniques to identify and block these bots, ensuring that they cannot scrape content without explicit authorization. Cloudflare’s vast network, which processes millions of requests per second, provides the data needed to constantly update its bot detection algorithms. The company has identified the most active AI bots, including Bytespider, GPTBot, and ClaudeBot, which scrape content to train generative AI models for companies like ByteDance, OpenAI, and Anthropic.
Cloudflare’s new tool not only targets well-known bots but can also detect bots disguised as human users. This capability is powered by a global machine learning model that can flag evasive bots, making it a robust solution against unauthorized scraping. The tool promises to protect content creators and maintain the integrity of the open web.











