- USDT(TRC-20)
- $0.0
Google introduced AI Overviews in search results shortly after Google I/O in May, but it wasnât first to the AI search game. It had already given Gemini the ability to search the internet, and Meta and other competing AI companies had done similarly with their own models. One of the biggest players in this field was Perplexity, which markets itself as a âconversational search engineââbasically another chatbot with internet access, but with even more of a focus on summaries and current events. Unfortunately, Perplexity is now finding itself in hot water after breaking rules and, like Google, returning wrong answer after wrong answer.
On June 11, Forbes published an article accusing Perplexity of stealing its content for quickly rewriting original articles without sourcing, and passing them off as its own. The AI company went as fair as to adapt Forbesâ reporting to podcast form. Shortly after, Wired ran an exposĂŠ on Perplexity, accusing it of âbullshittingâ and breaking a widely held internet rule (more on that shortly). Now, weâre learning a lot more about what kind of recent data an AI might be able to train on going forward, and why AIs often make so many mistakes when trying to sum up current events.
Bots arenât anything new on the internet. Before AI scraped websites for training material, search engines scraped websites to determine where to place them in search results. This led to a standard called the Robots Exclusion Protocol, which allows developers to lay out which parts of their site they donât want bots to access. Perplexity says it follows this rule, but, spurred on by the Forbes story and an accusation of rule breaking from developer Robb Knight, Wired conducted its own investigation. What it discovered wasn't flattering to Perplexity.
âWired provided the Perplexity chatbot with the headlines of dozens of articles published on our website this year, as well as prompts about the subjects of Wired reporting,â Wiredâs article reads. According to the investigation, the bot then returned answers âclosely paraphrasing Wired stories,â complete with original Wired art. Further, it would summarize stories âinaccurately and with minimal attribution.â
Examples include the chatbot inaccurately accusing a police officer of stealing bicycles, and, in a test, responding to a request to summarize a webpage containing a single sentence with a wholly invented story about a young girl going on a fairy tale adventure. Wired concluded Perplexityâs summaries were the result of the AI flagrantly breaking the Robots Exclusion Protocol, and that its inaccuracies likely stemmed from an attempt to sidestep said rule.
According to both Knight and Wired, when users ask Perplexity questions that would require the bot to summarize an article protected by the Robots Exclusion Protocol, a specific IP address running what is assumed to be an automated web browser would access the websites bots are not supposed to scrape. The IP address couldnât be tracked back to Perplexity with complete certainty, but its frequent association with the service raised suspicions.
In other cases, Wired recognized traces of its metadata in Perplexityâs responses, which could mean the bot may not be reading articles themselves, but accessing traces of it left in URLs and search engines. These wouldnât be protected by the Robots Exclusion Protocol, but are so light on information that theyâre more likely to lead to AI hallucinationsâhence the problem with misinformation in AI search results.
Both of these issues presage a battle for the future of AI in search engines, from both ethical and technical standpoints. Even as artists and other creators argue over AIâs right to scrape older works, accessing writing that is just a few days old puts Perplexity at further legal risk.
Perplexity CEO Aravind Srinivas issued a statement to Wired that said âthe questions from Wired reflect a deep and fundamental misunderstanding of how Perplexity and the Internet work.â At the same time, Forbes this week reportedly sent Perplexity a letter threatening legal action over âwillful infringementâ of its copyrights.
Full story here:
On June 11, Forbes published an article accusing Perplexity of stealing its content for quickly rewriting original articles without sourcing, and passing them off as its own. The AI company went as fair as to adapt Forbesâ reporting to podcast form. Shortly after, Wired ran an exposĂŠ on Perplexity, accusing it of âbullshittingâ and breaking a widely held internet rule (more on that shortly). Now, weâre learning a lot more about what kind of recent data an AI might be able to train on going forward, and why AIs often make so many mistakes when trying to sum up current events.
Perplexity is accused of breaking a longstanding internet rule
Bots arenât anything new on the internet. Before AI scraped websites for training material, search engines scraped websites to determine where to place them in search results. This led to a standard called the Robots Exclusion Protocol, which allows developers to lay out which parts of their site they donât want bots to access. Perplexity says it follows this rule, but, spurred on by the Forbes story and an accusation of rule breaking from developer Robb Knight, Wired conducted its own investigation. What it discovered wasn't flattering to Perplexity.
âWired provided the Perplexity chatbot with the headlines of dozens of articles published on our website this year, as well as prompts about the subjects of Wired reporting,â Wiredâs article reads. According to the investigation, the bot then returned answers âclosely paraphrasing Wired stories,â complete with original Wired art. Further, it would summarize stories âinaccurately and with minimal attribution.â
Examples include the chatbot inaccurately accusing a police officer of stealing bicycles, and, in a test, responding to a request to summarize a webpage containing a single sentence with a wholly invented story about a young girl going on a fairy tale adventure. Wired concluded Perplexityâs summaries were the result of the AI flagrantly breaking the Robots Exclusion Protocol, and that its inaccuracies likely stemmed from an attempt to sidestep said rule.
According to both Knight and Wired, when users ask Perplexity questions that would require the bot to summarize an article protected by the Robots Exclusion Protocol, a specific IP address running what is assumed to be an automated web browser would access the websites bots are not supposed to scrape. The IP address couldnât be tracked back to Perplexity with complete certainty, but its frequent association with the service raised suspicions.
In other cases, Wired recognized traces of its metadata in Perplexityâs responses, which could mean the bot may not be reading articles themselves, but accessing traces of it left in URLs and search engines. These wouldnât be protected by the Robots Exclusion Protocol, but are so light on information that theyâre more likely to lead to AI hallucinationsâhence the problem with misinformation in AI search results.
Both of these issues presage a battle for the future of AI in search engines, from both ethical and technical standpoints. Even as artists and other creators argue over AIâs right to scrape older works, accessing writing that is just a few days old puts Perplexity at further legal risk.
Perplexity CEO Aravind Srinivas issued a statement to Wired that said âthe questions from Wired reflect a deep and fundamental misunderstanding of how Perplexity and the Internet work.â At the same time, Forbes this week reportedly sent Perplexity a letter threatening legal action over âwillful infringementâ of its copyrights.
Full story here: