Gray Bots Surge as Generative AI Scraper Activity Increases
Summary:
A significant uptick in generative AI scraper bot activity has been observed between December 2024 and February 2025, signaling a growing operational concern for organizations managing web applications. Barracuda’s Generative AI Bot Activity Trends report highlights the rise of “gray bots”, AI-powered scrapers that, while not explicitly malicious, are capable of generating sustained and high-volume traffic as they harvest data to train machine learning models. These bots include ClaudeBot, operated by Anthropic, and TikTok’s Bytespider, both of which were responsible for millions of automated requests targeting various websites during the reporting period. In one case, a single web application recorded 9.7 million bot requests in 30 days, while another experienced over 500,000 requests within 24 hours. One application was found to receive a sustained load of 17,000 requests per hour continuously across a full day.
Unlike traditional web scrapers that operate in short, burst-like sessions, these generative AI bots run persistent, long-duration scraping campaigns. This consistent activity pattern makes detection and mitigation more difficult, as traffic appears regular and may blend with legitimate user behavior. The impact of gray bots is multifaceted: they can overload infrastructure, degrade application performance, and significantly inflate bandwidth and compute usage leading to increased cloud hosting expenses. Additionally, by harvesting proprietary or copyrighted data without authorization, they introduce legal and compliance challenges, particularly in industries like healthcare, financial services, and media where sensitive data and regulatory frameworks are involved.
Security Officer Comments:
Another operational risk posed by these bots is their effect on analytics platforms. The distorted traffic inflates metrics, misguiding business intelligence and marketing decisions. While ClaudeBot at least offers public documentation for administrators wishing to block its scraping, Bytespider provides little transparency, complicating identification and enforcement efforts. Other generative AI-related bots detected include PerplexityBot and DeepSeekBot, both of which similarly engage in persistent scraping to fuel model training and optimization.
Suggested Corrections:
With gray bots becoming a persistent part of online traffic, organizations must take proactive steps to manage their impact. One common approach is deploying robots.txt, a tool that signals scrapers to avoid collecting site data. However, this method is not legally enforceable and many bots ignore it.
For more effective protection, companies are turning to AI-powered bot defense systems that leverage machine learning to detect and block scraper bot activity in real time.
As debates over the ethical, legal and commercial implications of AI scraper bots continue, organizations must prioritize security to safeguard their data and operations.
Link(s):
https://www.infosecurity-magazine.com/news/gray-bots-generative-ai-scraper/
A significant uptick in generative AI scraper bot activity has been observed between December 2024 and February 2025, signaling a growing operational concern for organizations managing web applications. Barracuda’s Generative AI Bot Activity Trends report highlights the rise of “gray bots”, AI-powered scrapers that, while not explicitly malicious, are capable of generating sustained and high-volume traffic as they harvest data to train machine learning models. These bots include ClaudeBot, operated by Anthropic, and TikTok’s Bytespider, both of which were responsible for millions of automated requests targeting various websites during the reporting period. In one case, a single web application recorded 9.7 million bot requests in 30 days, while another experienced over 500,000 requests within 24 hours. One application was found to receive a sustained load of 17,000 requests per hour continuously across a full day.
Unlike traditional web scrapers that operate in short, burst-like sessions, these generative AI bots run persistent, long-duration scraping campaigns. This consistent activity pattern makes detection and mitigation more difficult, as traffic appears regular and may blend with legitimate user behavior. The impact of gray bots is multifaceted: they can overload infrastructure, degrade application performance, and significantly inflate bandwidth and compute usage leading to increased cloud hosting expenses. Additionally, by harvesting proprietary or copyrighted data without authorization, they introduce legal and compliance challenges, particularly in industries like healthcare, financial services, and media where sensitive data and regulatory frameworks are involved.
Security Officer Comments:
Another operational risk posed by these bots is their effect on analytics platforms. The distorted traffic inflates metrics, misguiding business intelligence and marketing decisions. While ClaudeBot at least offers public documentation for administrators wishing to block its scraping, Bytespider provides little transparency, complicating identification and enforcement efforts. Other generative AI-related bots detected include PerplexityBot and DeepSeekBot, both of which similarly engage in persistent scraping to fuel model training and optimization.
Suggested Corrections:
With gray bots becoming a persistent part of online traffic, organizations must take proactive steps to manage their impact. One common approach is deploying robots.txt, a tool that signals scrapers to avoid collecting site data. However, this method is not legally enforceable and many bots ignore it.
For more effective protection, companies are turning to AI-powered bot defense systems that leverage machine learning to detect and block scraper bot activity in real time.
As debates over the ethical, legal and commercial implications of AI scraper bots continue, organizations must prioritize security to safeguard their data and operations.
Link(s):
https://www.infosecurity-magazine.com/news/gray-bots-generative-ai-scraper/