Skip to content

πŸ€– An open source bot & crawler traffic verification tool

License

Notifications You must be signed in to change notification settings

JohnPaton/bottica

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Bottica

An open source bot & crawler traffic verification tool.

Bots & crawlers generate huge amounts of traffic to most websites. Much of this traffic is good, in the sense that it creates value (e.g. traffic from Googlebot is how you keep yourself in search results). Good bots generally identify themselves with their User-Agent string so that you don't block them, and respect your robots.txt.

However, plenty of bad bots just copy good bots' User-Agents to try to slip under the radar. That's why the biggest good bots offer verification methods beyond just the User-Agent, so you can be sure that a request from them is really legit.

πŸ›’πŸ› A one-stop shop for bot traffic verification

Maintainers of good bots publish the details of their verification methods on their websites. However, finding these and keeping track of them can be a hassle, especially if you want to verify traffic from several bots.

Bottica Core aims to be a one-stop shop for tracking verification methods, providing a unified, language-agnostic specification for bot verification. This approach was inspired by the ua-parser project, which is a similarly-implemented list of regular expressions for parsing User-Agent strings. Bottica attempts to align its bot names with the matching name from ua-parser, so that that project can be used to parse UAs, and the extracted information can be used directly to verify the request using Bottica.

Bottica can be implemented in any language that is capable of performing DNS lookups. Currently, only a Python implementation is provided, but PRs adding implementations in other language would be more than welcome!

βœ… Verifying good requests

Broadly speaking, good bots provide verification in one of two ways:

  • πŸ”„ Forward-confirmed reverse DNS (FCrDNS)
  • πŸ“ƒ Publishing whitelists of IPs that they use in some form

πŸ”„ FCrDNS

A forward-confirmed reverse DNS verification is a two step process:

  1. A reverse DNS query is performed on an IP to check its reported hostname
  2. A forward DNS query is performed on the hostname to get its reported list of IPs. The original IP should appear in the hosts list of IPs if the reported host name hasn't been spoofed.

As an additional check, bot owners will generally also provide a set of host names IPs of their bots must resolve to. Bottica Core supports FCrDNS both with and without additional host name verification.

πŸ“ƒ IP whitelists

Publishing an IP whitelist is a very simple verification method employed by many bot owners. It can also be combined with FCrDNS to provide a double layer of verification.

Bottica Core supports IP whitelists as list of individual IPs, IP ranges, and CIDR blocks.

About

πŸ€– An open source bot & crawler traffic verification tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages