The bottica
specification and list of verifiers: bottica.yaml
.
The bottica
object contains a single entry, bots
. Its value is a list
of bots, which have a name
and one or more verifiers. A json-schema of
bottica
is available at schema.yaml
.
We aim for bot names to be in with sync their User-Agent family as parsed by the ua-parser project (UAP) to maximize interoperability.
In all entries where an IP is required, bottica
supports both IPv4 and IPv6.
Each entry in bottica
can have one or more verifiers. For example,
Pinterestbot
traffic should be both verified by FCrDNS, and it should
come from a specific IP range:
- name: Pinterestbot
ip_ranges:
- min: 54.236.1.1
max: 54.236.1.255
fcrdns_hosts:
- pinterest.com
The supported verifiers are:
- 🔄
fcrdns_hosts
: Forward-confirmed reverse DNS. - 📃
ip_list
: Simple IP whitelist - 🏔
ip_ranges
: Whitelisting of ranges of IPs - 🍎
cidr_list
: Whitelisting of CIDR blocks
fcrdns_hosts
should contain a list of hosts that are allowed to
participate in a FCrDNS lookup:
- A reverse DNS query is performed on an IP to check its reported hostname
- A DNS query is performed on the hostname to get its reported list of IPs. The original IP should appear in the hosts list of IPs if the reported host name hasn't been spoofed.
If the list fcrdns_hosts
list is present but empty, an FCrDNS check will
be performed, with a positive verification as long as the IP matches the
host's reported IP list. If there are hosts listed under fcrdns_hosts
, then the
verification will additionally only succeed if the last part of the hostname
matches one of the entries.
Example
- name: Googlebot
fcrdns_hosts:
- google.com
- googlebot.com
A simple whitelist of allowed IP addresses.
Example
- name: DuckDuckBot
ip_list:
- 23.21.227.69
- 50.16.241.113
- 50.16.241.114
- 50.16.241.117
- 50.16.247.234
- 52.204.97.54
- 52.5.190.19
- 54.197.234.188
- 54.208.100.253
- 54.208.102.37
- 107.21.1.8
A list of IP ranges, one of which must contain the bot's IP. A range is
specified with the min
IP and the max
IP, with both boundaries
being inclusive.
Example
- name: Pinterestbot
ip_ranges:
- min: 54.236.1.1
max: 54.236.1.255
A list of CIDR blocks, one of which must contain the bot's IP address.
Example
- name: AhrefsBot
# https://ahrefs.com/robot
cidr_list:
- 54.36.148.0/24
- 54.36.149.0/24
- 54.36.150.0/24
- 195.154.122.0/24
- 195.154.123.0/24
- 195.154.126.0/24
- 195.154.127.0/24
Since ua-parser
is an external dependency, it is sometimes necessary
to make some additions or tweaks to their parsing, e.g. in cases where
a major bot is missing, or when their parsed family is too specific
(think of one bot going under many different names, which would all
require separate entries in bottica
).
uap_extras.yaml
provides overrides and additions
to the User-Agent parsing from
uap-core. These should only
contain a regex to match, and possibly a family_name
override for
setting the bot's name. If provided, family_name
must match one of
the bot names in bottica
. If not provided, the first matching group
of the regex
must always match one of the bot names in bottica
.