Since AI races even known to be good behaving bots ignore
Code:
robots.txt
Code:
429
Here is an overview of our current efforts and planned enhancements based on the points you raised and our internal developments:
Our recent splash screen improvements and "cool-down" ModSecurity rules have successfully blocked bots like Bytespider and Claudebot based purely on their User-Agent strings. We are expanding these checks across more Uer-Agents. Also we are preparing new ModSecurity rules specifically designed to track requests from a list of known AI bots and possibly punish for their robots.txt uncompliance. This list includes bots like Perplexity-User, anthropic-ai, Claude-Web, cohere-ai, Applebot, and others. This rule is scheduled for release in one of the next releases. Initially, this rule will track requests and pass to gather data effectively without introducing thresholds to traffic yet.
We are exploring more sophisticated rate-limiting and blocking mechanisms tied to bot identification. The goal is to move beyond simple IP blocks or captcha challenges triggered by single requests but for less aggressive specific bots, there will be an opportunity to implement Block by custom rules, yet we analyse incoming data to find suitable thresholds.
We are actively discussing how to best implement an opt-in system to block specific categories of bots, such as AI crawlers, providing more granular control based on specific needs. This aligns with your suggestion and is a priority for future development.
Leave a comment: