In a disturbing revelation, it has been discovered that Perplexity, a generative AI search engine, is ignoring the instructions in the robots.txt file, which is meant to control bots and crawlers like itself. This means that Perplexity is accessing websites that administrators have explicitly prohibited it from visiting. This is a serious breach of trust, as it undermines the control that website administrators have over their own content.

The issue came to light when Rob Knight, a technology blogger, blocked PerplexityBot, the crawler used by Perplexity, in the robots.txt of his blog. However, when he tested the block, he found that Perplexity was still able to access and summarize his blog post. Further investigation revealed that PerplexityBot was using a headless browser to scrape content, ignoring the robots.txt file altogether. What’s more, Perplexity’s user agent string did not contain the ‘PerplexityBot’ part, which allowed it to bypass the robots.txt restrictions.

This issue has sparked a heated debate, with many pointing out the negative implications of generative AI search engines like Perplexity crawling websites without permission. Not only does it undermine website administrators’ control over their content, but it also raises concerns about the unauthorized use of internet data to train generative AI. As one user on the social news site Hacker News pointed out, “forcing users to block crawlers by AI development companies could have a negative impact on ad blockers and other useful software.” It remains to be seen how Perplexity will respond to these allegations and whether they will take steps to respect website administrators’ wishes.

Source.

TOP STORIES

The Quantum Revolution - Transforming Technology and Security
Quantum computing is transforming industries, but it poses significant cybersecurity risks …
Investigation Launched Into OpenAI by State Attorneys General
A coalition of state attorneys general has opened an investigation into OpenAI …
Anthropic Faces AI Export Controls - A New Era of Regulation
The U.S. government’s export control directive has forced Anthropic to disable its new AI models, raising questions about regulation and …
SpaceX's Bold Move - Merging Rockets with AI Power
SpaceX’s recent deal with Google highlights its shift from aerospace to AI infrastructure …
Google Takes Action Against AI-Driven Cybercrime Network
Google is suing to dismantle the infrastructure behind an alleged massive AI-powered cybercrime operation …
AI Adoption Surges Despite Public Concerns
AI usage continues to grow rapidly, even as public sentiment remains skeptical …

latest stories