.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.botdetection.http_user_agent Namespace Reference

Functions

 regexp_user_agent ()
 
werkzeug.Response|None filter_request (IPv4Network|IPv6Network network, flask.Request request, config.Config cfg)
 

Variables

tuple USER_AGENT
 
 _regexp = None
 

Detailed Description

Method ``http_user_agent``
--------------------------

The ``http_user_agent`` method evaluates a request as the request of a bot if
the User-Agent_ header is unset or matches the regular expression
:py:obj:`USER_AGENT`.

.. _User-Agent:
   https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent

Function Documentation

◆ filter_request()

werkzeug.Response | None searx.botdetection.http_user_agent.filter_request ( IPv4Network | IPv6Network network,
flask.Request request,
config.Config cfg )

Definition at line 57 of file http_user_agent.py.

61) -> werkzeug.Response | None:
62
63 user_agent = request.headers.get('User-Agent', 'unknown')
64 if regexp_user_agent().match(user_agent):
65 return too_many_requests(network, f"bot detected, HTTP header User-Agent: {user_agent}")
66 return None

References searx.botdetection.http_user_agent.regexp_user_agent().

+ Here is the call graph for this function:

◆ regexp_user_agent()

searx.botdetection.http_user_agent.regexp_user_agent ( )

Definition at line 50 of file http_user_agent.py.

50def regexp_user_agent():
51 global _regexp # pylint: disable=global-statement
52 if not _regexp:
53 _regexp = re.compile(USER_AGENT)
54 return _regexp
55
56

Referenced by searx.botdetection.http_user_agent.filter_request().

+ Here is the caller graph for this function:

Variable Documentation

◆ _regexp

searx.botdetection.http_user_agent._regexp = None
protected

Definition at line 47 of file http_user_agent.py.

◆ USER_AGENT

tuple searx.botdetection.http_user_agent.USER_AGENT
Initial value:
1= (
2 r'('
3 + r'unknown'
4 + r'|[Cc][Uu][Rr][Ll]|[wW]get|Scrapy|splash|JavaFX|FeedFetcher|python-requests|Go-http-client|Java|Jakarta|okhttp'
5 + r'|HttpClient|Jersey|Python|libwww-perl|Ruby|SynHttpClient|UniversalFeedParser|Googlebot|GoogleImageProxy'
6 + r'|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot'
7 + r'|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT|Sogou|Abonti|Pixray|Spinn3r|SemrushBot|Exabot'
8 + r'|ZmEu|BLEXBot|bitlybot|HeadlessChrome'
9 # unmaintained Farside instances
10 + r'|'
11 + re.escape(r'Mozilla/5.0 (compatible; Farside/0.1.0; +https://farside.link)')
12 # other bots and client to block
13 + '|.*PetalBot.*'
14 + r')'
15)

Definition at line 30 of file http_user_agent.py.