.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.search.checker.impl Namespace Reference

Classes

class  Checker
 
class  CheckerTests
 
class  ResultContainerTests
 
class  TestResults
 

Functions

 get_check_no_html ()
 
 _is_url (url)
 
bool _download_and_check_if_image (str image_url)
 
bool _is_url_image (image_url)
 
typing.Dict[str, typing.Any] _search_query_to_dict (SearchQuery search_query)
 
typing.Tuple[typing.Dict[str, typing.Any], typing.Dict[str, typing.Any]] _search_query_diff (SearchQuery sq1, SearchQuery sq2)
 

Variables

 logger = logger.getChild('searx.search.checker')
 
list HTML_TAGS
 
 _check_no_html = get_check_no_html()
 

Function Documentation

◆ _download_and_check_if_image()

bool searx.search.checker.impl._download_and_check_if_image ( str image_url)
protected
Download an URL and check if the Content-Type starts with "image/"
This function should not be called directly: use _is_url_image
otherwise the cache of functools.lru_cache contains data: URL which might be huge.

Definition at line 64 of file impl.py.

64def _download_and_check_if_image(image_url: str) -> bool:
65 """Download an URL and check if the Content-Type starts with "image/"
66 This function should not be called directly: use _is_url_image
67 otherwise the cache of functools.lru_cache contains data: URL which might be huge.
68 """
69 retry = 2
70
71 while retry > 0:
72 a = time()
73 try:
74 # use "image_proxy" (avoid HTTP/2)
75 network.set_context_network_name('image_proxy')
76 r, stream = network.stream(
77 'GET',
78 image_url,
79 timeout=10.0,
80 allow_redirects=True,
81 headers={
82 'User-Agent': gen_useragent(),
83 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
84 'Accept-Language': 'en-US;q=0.5,en;q=0.3',
85 'Accept-Encoding': 'gzip, deflate, br',
86 'DNT': '1',
87 'Connection': 'keep-alive',
88 'Upgrade-Insecure-Requests': '1',
89 'Sec-GPC': '1',
90 'Cache-Control': 'max-age=0',
91 },
92 )
93 r.close()
94 if r.status_code == 200:
95 is_image = r.headers.get('content-type', '').startswith('image/')
96 else:
97 is_image = False
98 del r
99 del stream
100 return is_image
101 except httpx.TimeoutException:
102 logger.error('Timeout for %s: %i', image_url, int(time() - a))
103 retry -= 1
104 except httpx.HTTPError:
105 logger.exception('Exception for %s', image_url)
106 return False
107 return False
108
109

Referenced by searx.search.checker.impl._is_url_image().

+ Here is the caller graph for this function:

◆ _is_url()

searx.search.checker.impl._is_url ( url)
protected

Definition at line 53 of file impl.py.

53def _is_url(url):
54 try:
55 result = urlparse(url)
56 except ValueError:
57 return False
58 if result.scheme not in ('http', 'https'):
59 return False
60 return True
61
62
63@functools.lru_cache(maxsize=8192)

Referenced by searx.search.checker.impl._is_url_image().

+ Here is the caller graph for this function:

◆ _is_url_image()

bool searx.search.checker.impl._is_url_image ( image_url)
protected
Normalize image_url

Definition at line 110 of file impl.py.

110def _is_url_image(image_url) -> bool:
111 """Normalize image_url"""
112 if not isinstance(image_url, str):
113 return False
114
115 if image_url.startswith('//'):
116 image_url = 'https:' + image_url
117
118 if image_url.startswith('data:'):
119 return image_url.startswith('data:image/')
120
121 if not _is_url(image_url):
122 return False
123
124 return _download_and_check_if_image(image_url)
125
126

References searx.search.checker.impl._download_and_check_if_image(), and searx.search.checker.impl._is_url().

Referenced by searx.search.checker.impl.ResultContainerTests._check_result().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _search_query_diff()

typing.Tuple[typing.Dict[str, typing.Any], typing.Dict[str, typing.Any]] searx.search.checker.impl._search_query_diff ( SearchQuery sq1,
SearchQuery sq2 )
protected

Definition at line 137 of file impl.py.

139) -> typing.Tuple[typing.Dict[str, typing.Any], typing.Dict[str, typing.Any]]:
140 param1 = _search_query_to_dict(sq1)
141 param2 = _search_query_to_dict(sq2)
142 common = {}
143 diff = {}
144 for k, value1 in param1.items():
145 value2 = param2[k]
146 if value1 == value2:
147 common[k] = value1
148 else:
149 diff[k] = (value1, value2)
150 return (common, diff)
151
152

References searx.search.checker.impl._search_query_to_dict().

+ Here is the call graph for this function:

◆ _search_query_to_dict()

typing.Dict[str, typing.Any] searx.search.checker.impl._search_query_to_dict ( SearchQuery search_query)
protected

Definition at line 127 of file impl.py.

127def _search_query_to_dict(search_query: SearchQuery) -> typing.Dict[str, typing.Any]:
128 return {
129 'query': search_query.query,
130 'lang': search_query.lang,
131 'pageno': search_query.pageno,
132 'safesearch': search_query.safesearch,
133 'time_range': search_query.time_range,
134 }
135
136

Referenced by searx.search.checker.impl.ResultContainerTests._record_error(), and searx.search.checker.impl._search_query_diff().

+ Here is the caller graph for this function:

◆ get_check_no_html()

searx.search.checker.impl.get_check_no_html ( )

Definition at line 39 of file impl.py.

39def get_check_no_html():
40 rep = ['<' + tag + r'[^>]*>' for tag in HTML_TAGS]
41 rep += ['</' + tag + '>' for tag in HTML_TAGS]
42 pattern = re.compile('|'.join(rep))
43
44 def f(text):
45 return pattern.search(text.lower()) is None
46
47 return f
48
49

Variable Documentation

◆ _check_no_html

◆ HTML_TAGS

list searx.search.checker.impl.HTML_TAGS
Initial value:
1= [
2 # fmt: off
3 'embed', 'iframe', 'object', 'param', 'picture', 'source', 'svg', 'math', 'canvas', 'noscript', 'script',
4 'del', 'ins', 'area', 'audio', 'img', 'map', 'track', 'video', 'a', 'abbr', 'b', 'bdi', 'bdo', 'br', 'cite',
5 'code', 'data', 'dfn', 'em', 'i', 'kdb', 'mark', 'q', 'rb', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small',
6 'span', 'strong', 'sub', 'sup', 'time', 'u', 'var', 'wbr', 'style', 'blockquote', 'dd', 'div', 'dl', 'dt',
7 'figcaption', 'figure', 'hr', 'li', 'ol', 'p', 'pre', 'ul', 'button', 'datalist', 'fieldset', 'form', 'input',
8 'label', 'legend', 'meter', 'optgroup', 'option', 'output', 'progress', 'select', 'textarea', 'applet',
9 'frame', 'frameset'
10 # fmt: on
11]

Definition at line 26 of file impl.py.

◆ logger

searx.search.checker.impl.logger = logger.getChild('searx.search.checker')

Definition at line 24 of file impl.py.