Reddit announced significant updates to its Robots Exclusion Protocol (robots.txt file) to control automated web bots’ access to its content. Traditionally, the robots.txt file allowed search engines to scrape content for indexing, but the rise of AI has led to misuse, with websites being scraped to train models without proper acknowledgment. Alongside the updated robots.txt, Reddit will continue to rate-limit and block unknown bots that do not adhere to its Public Content Policy or lack an agreement with the platform. These changes aim to protect Reddit content from being exploited by AI companies for model training purposes. The update should not impact most users or legitimate actors like researchers and organizations such as the Internet Archive. The move follows a Wired investigation revealing that AI-powered search startup Perplexity ignored scraping requests, highlighting the need for stricter controls. Reddit’s new policy signals to AI companies that they must pay to use its data for training. The changes come after Reddit implemented a policy to guide data access and usage by commercial entities and partners.

Reddit Tightens Control on AI Bots with New Robots.txt Update
Reddit announces updates to robots.txt to deter unauthorized AI scraping.
1–2 minutes










