The internet is facing a new challenge as AI companies increasingly deploy web crawlers to collect vast amounts of data for training their models. These AI bot scrapers are putting a strain on servers and raising concerns about the unauthorized use of content. Iaso, a developer, experienced this firsthand when her own Git server struggled under the load of AI scrapers.
In response, she created Anubis, an open-source program designed to protect against these bots. Since its launch in January, Anubis has been downloaded nearly 200,000 times and is being used by notable organizations, including GNOME, FFmpeg, and UNESCO. The impact of AI scraping extends beyond server strain.
As users increasingly rely on AI-generated summaries provided by search engines, the number of visits to original content sites has decreased. This trend affects content creators who rely on site visits and ad revenue for compensation. Matthew Prince, CEO of Cloudflare, highlighted the growing disparity between the number of pages crawled and the number of visitors redirected.
AI scrapers impact content access
Ten years ago, for every two pages Google crawled, it directed one visitor to a content creator’s site. Today, that ratio stands at 18 pages crawled for every single visitor sent.
The situation is even more concerning with AI-specific sites like OpenAI, where the ratio has ballooned to 1,500 pages scraped for every visitor redirected. This shift in user behavior, along with the increasing reliance on AI summaries, poses challenges for content creators. Publishers and writers may struggle to maintain their readership and revenue streams as their content is absorbed by AI systems and presented back to users in a summarized form.
To address this issue, the industry must find a balance where AI can complement rather than replace original content. Creating systems that direct readers to original sources, ensure the accurate dissemination of information, and compensate creators for their contributions can form part of the solution. As the web crawling landscape evolves, driven by the merging roles of search engines and AI, webmasters must manage and understand the presence and impact of these bots on their websites.
Tools like Anubis represent a significant step in protecting smaller internet entities from being overwhelmed by AI scraping activities, but more comprehensive solutions are needed to ensure a sustainable future for content creators in the age of AI.
Senior Software Engineer with a passion for building practical, user-centric applications. He specializes in full-stack development with a strong focus on crafting elegant, performant interfaces and scalable backend solutions. With experience leading teams and delivering robust, end-to-end products, he thrives on solving complex problems through clean and efficient code.





















