Recent research has uncovered a significant vulnerability in AI-driven deep-research systems, such as OpenAI’s Deep Research and Google’s Gemini Deep Research. This flaw allows a brief Reddit comment to manipulate the reports these systems generate, potentially affecting thousands of users.
Researchers from Cornell Tech have introduced a technique called Web Agent Retrieval Poisoning (WARP). This method exploits how multi-agent AI systems retrieve and synthesize information from the web. These systems, including STORM, Co-STORM, and OmniThink, break down user queries into sub-queries, gather content from various online sources, and produce structured reports.
The vulnerability lies in the consistent retrieval of a limited set of user-generated content (UGC) pages, primarily from platforms like Reddit and Wikipedia. This consistent pattern creates a concentrated attack surface. By adding a short, crafted promotional text—approximately 13 words—to a frequently accessed Reddit thread, an attacker can influence the AI system to cite this manipulated content. Consequently, the system may incorporate false entities, brands, services, or misinformation into its reports.
Stages of the WARP Attack
The WARP attack unfolds in three main stages:
- Reconnaissance: The attacker uses public search engines to identify UGC URLs that consistently appear in search results for specific topics. This step requires no special access, only the ability to perform standard web searches.
- Content Generation: The attacker crafts a brief promotional passage, often with the assistance of language models, to seamlessly blend into the existing content of the targeted page while promoting a fictitious entity.
- Deployment: The attacker posts the crafted text as a comment on the identified Reddit thread. Once indexed by search engines, this manipulated content becomes part of the AI system’s knowledge base whenever the targeted URL is retrieved.
Experiments conducted across 176 queries in 11 topic clusters, including areas like cryptocurrency investment advice and local restaurant recommendations, demonstrated the severity of this vulnerability:
- Co-STORM exhibited a 100% conditional citation rate, meaning every time the poisoned URL was retrieved, the fabricated entity was cited in the final report.
- STORM showed conditional citation rates between 72.5% and 80.8%, with mention rates up to 56.9%.
- For closed-source commercial systems, Gemini Deep Research cited UGC at a 12.1% rate, with 102 recurring UGC URLs across the tested topic clusters, indicating significant exposure to this attack vector.
- OpenAI Deep Research had a lower UGC citation rate of approximately 0.4%, as it largely filters out Reddit and similar sources from final citations. However, poisoned UGC could still influence intermediate reasoning steps.
Reddit emerged as the most frequently retrieved UGC platform across all tested systems, accounting for 54% to 71% of all UGC URLs retrieved. This prominence makes it a prime target for adversaries aiming to exploit this vulnerability.
In light of these findings, it’s crucial for developers and users of AI-driven research tools to be aware of the potential for content manipulation through seemingly innocuous user-generated comments. Implementing more robust content verification and source validation mechanisms is essential to mitigate the risks associated with such attacks.