.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.webutils Namespace Reference

Classes

class  CSVWriter
class  JSONEncoder

Functions

 get_translated_errors (Iterable[UnresponsiveEngine] unresponsive_engines)
None write_csv_response (CSVWriter csv, ResultContainer rc)
str get_json_response (SearchQuery sq, ResultContainer rc)
 get_themes (templates_path)
list[str] get_static_file_list ()
 get_result_templates (templates_path)
 new_hmac (secret_key, url)
 is_hmac_of (secret_key, value, hmac_to_check)
 prettify_url (url, max_length=74)
bool contains_cjko (str s)
str regex_highlight_cjk (str word)
 highlight_content (content, query)
str searxng_l10n_timespan (datetime dt)
List[Tuple[str, Iterable[Engine]]] group_engines_in_tab (Iterable[Engine] engines)

Variables

 VALID_LANGUAGE_CODE = re.compile(r'^[a-z]{2,3}(-[a-zA-Z]{2})?$')
 logger = logger.getChild('webutils')
 timeout_text = gettext('timeout')
 parsing_error_text = gettext('parsing error')
 http_protocol_error_text = gettext('HTTP protocol error')
 network_error_text = gettext('network error')
 ssl_cert_error_text = gettext("SSL error: certificate validation has failed")
dict exception_classname_to_text
str NO_SUBGROUPING = 'without further subgrouping'

Function Documentation

◆ contains_cjko()

bool searx.webutils.contains_cjko ( str s)
This function check whether or not a string contains Chinese, Japanese,
or Korean characters. It employs regex and uses the u escape sequence to
match any character in a set of Unicode ranges.

Args:
    s (str): string to be checked.

Returns:
    bool: True if the input s contains the characters and False otherwise.

Definition at line 226 of file webutils.py.

226def contains_cjko(s: str) -> bool:
227 """This function check whether or not a string contains Chinese, Japanese,
228 or Korean characters. It employs regex and uses the u escape sequence to
229 match any character in a set of Unicode ranges.
230
231 Args:
232 s (str): string to be checked.
233
234 Returns:
235 bool: True if the input s contains the characters and False otherwise.
236 """
237 unicode_ranges = (
238 '\u4e00-\u9fff' # Chinese characters
239 '\u3040-\u309f' # Japanese hiragana
240 '\u30a0-\u30ff' # Japanese katakana
241 '\u4e00-\u9faf' # Japanese kanji
242 '\uac00-\ud7af' # Korean hangul syllables
243 '\u1100-\u11ff' # Korean hangul jamo
244 )
245 return bool(re.search(fr'[{unicode_ranges}]', s))
246
247

Referenced by regex_highlight_cjk().

Here is the caller graph for this function:

◆ get_json_response()

str searx.webutils.get_json_response ( SearchQuery sq,
ResultContainer rc )
Returns the JSON string of the results to a query (``application/json``)

Definition at line 160 of file webutils.py.

160def get_json_response(sq: SearchQuery, rc: ResultContainer) -> str:
161 """Returns the JSON string of the results to a query (``application/json``)"""
162 data = {
163 'query': sq.query,
164 'number_of_results': rc.number_of_results,
165 'results': [_.as_dict() for _ in rc.get_ordered_results()],
166 'answers': [_.as_dict() for _ in rc.answers],
167 'corrections': list(rc.corrections),
168 'infoboxes': rc.infoboxes,
169 'suggestions': list(rc.suggestions),
170 'unresponsive_engines': get_translated_errors(rc.unresponsive_engines),
171 }
172 response = json.dumps(data, cls=JSONEncoder)
173 return response
174
175

References get_translated_errors().

Here is the call graph for this function:

◆ get_result_templates()

searx.webutils.get_result_templates ( templates_path)

Definition at line 199 of file webutils.py.

199def get_result_templates(templates_path):
200 result_templates = set()
201 templates_path_length = len(templates_path) + 1
202 for directory, _, files in os.walk(templates_path):
203 if directory.endswith('result_templates'):
204 for filename in files:
205 f = os.path.join(directory[templates_path_length:], filename)
206 result_templates.add(f)
207 return result_templates
208
209

◆ get_static_file_list()

list[str] searx.webutils.get_static_file_list ( )

Definition at line 181 of file webutils.py.

181def get_static_file_list() -> list[str]:
182 file_list = []
183 static_path = pathlib.Path(str(get_setting("ui.static_path")))
184
185 def _walk(path: pathlib.Path):
186 for f in path.iterdir():
187 if f.name.startswith('.'):
188 # ignore hidden file
189 continue
190 if f.is_file():
191 file_list.append(str(f.relative_to(static_path)))
192 if f.is_dir():
193 _walk(f)
194
195 _walk(static_path)
196 return file_list
197
198

References searx.get_setting().

Here is the call graph for this function:

◆ get_themes()

searx.webutils.get_themes ( templates_path)
Returns available themes list.

Definition at line 176 of file webutils.py.

176def get_themes(templates_path):
177 """Returns available themes list."""
178 return os.listdir(templates_path)
179
180

◆ get_translated_errors()

searx.webutils.get_translated_errors ( Iterable[UnresponsiveEngine] unresponsive_engines)

Definition at line 70 of file webutils.py.

70def get_translated_errors(unresponsive_engines: Iterable[UnresponsiveEngine]):
71 translated_errors = []
72
73 for unresponsive_engine in unresponsive_engines:
74 error_user_text = exception_classname_to_text.get(unresponsive_engine.error_type)
75 if not error_user_text:
76 error_user_text = exception_classname_to_text[None]
77 error_msg = gettext(error_user_text)
78 if unresponsive_engine.suspended:
79 error_msg = gettext('Suspended') + ': ' + error_msg
80 translated_errors.append((unresponsive_engine.engine, error_msg))
81
82 return sorted(translated_errors, key=lambda e: e[0])
83
84

Referenced by get_json_response().

Here is the caller graph for this function:

◆ group_engines_in_tab()

List[Tuple[str, Iterable[Engine]]] searx.webutils.group_engines_in_tab ( Iterable[Engine] engines)
Groups an Iterable of engines by their first non tab category (first subgroup)

Definition at line 314 of file webutils.py.

314def group_engines_in_tab(engines: Iterable[Engine]) -> List[Tuple[str, Iterable[Engine]]]:
315 """Groups an Iterable of engines by their first non tab category (first subgroup)"""
316
317 def get_subgroup(eng):
318 non_tab_categories = [c for c in eng.categories if c not in tabs + [DEFAULT_CATEGORY]]
319 return non_tab_categories[0] if len(non_tab_categories) > 0 else NO_SUBGROUPING
320
321 def group_sort_key(group):
322 return (group[0] == NO_SUBGROUPING, group[0].lower())
323
324 def engine_sort_key(engine):
325 return (engine.about.get('language', ''), engine.name)
326
327 tabs = list(get_setting('categories_as_tabs').keys())
328 subgroups = itertools.groupby(sorted(engines, key=get_subgroup), get_subgroup)
329 sorted_groups = sorted(((name, list(engines)) for name, engines in subgroups), key=group_sort_key)
330
331 ret_val = []
332 for groupname, _engines in sorted_groups:
333 group_bang = '!' + groupname.replace(' ', '_') if groupname != NO_SUBGROUPING else ''
334 ret_val.append((groupname, group_bang, sorted(_engines, key=engine_sort_key)))
335
336 return ret_val

References searx.get_setting().

Here is the call graph for this function:

◆ highlight_content()

searx.webutils.highlight_content ( content,
query )

Definition at line 268 of file webutils.py.

268def highlight_content(content, query):
269
270 if not content:
271 return None
272
273 # ignoring html contents
274 if content.find('<') != -1:
275 return content
276
277 querysplit = query.split()
278 queries = []
279 for qs in querysplit:
280 qs = qs.replace("'", "").replace('"', '').replace(" ", "")
281 if len(qs) > 0:
282 queries.extend(re.findall(regex_highlight_cjk(qs), content, flags=re.I | re.U))
283 if len(queries) > 0:
284 regex = re.compile("|".join(map(regex_highlight_cjk, queries)))
285 return regex.sub(lambda match: f'<span class="highlight">{match.group(0)}</span>'.replace('\\', r'\\'), content)
286 return content
287
288

References regex_highlight_cjk().

Here is the call graph for this function:

◆ is_hmac_of()

searx.webutils.is_hmac_of ( secret_key,
value,
hmac_to_check )

Definition at line 214 of file webutils.py.

214def is_hmac_of(secret_key, value, hmac_to_check):
215 hmac_of_value = new_hmac(secret_key, value)
216 return len(hmac_of_value) == len(hmac_to_check) and hmac.compare_digest(hmac_of_value, hmac_to_check)
217
218

References new_hmac().

Here is the call graph for this function:

◆ new_hmac()

searx.webutils.new_hmac ( secret_key,
url )

Definition at line 210 of file webutils.py.

210def new_hmac(secret_key, url):
211 return hmac.new(secret_key.encode(), url, hashlib.sha256).hexdigest()
212
213

Referenced by is_hmac_of().

Here is the caller graph for this function:

◆ prettify_url()

searx.webutils.prettify_url ( url,
max_length = 74 )

Definition at line 219 of file webutils.py.

219def prettify_url(url, max_length=74):
220 if len(url) > max_length:
221 chunk_len = int(max_length / 2 + 1)
222 return '{0}[...]{1}'.format(url[:chunk_len], url[-chunk_len:])
223 return url
224
225

◆ regex_highlight_cjk()

str searx.webutils.regex_highlight_cjk ( str word)
Generate the regex pattern to match for a given word according
to whether or not the word contains CJK characters or not.
If the word is and/or contains CJK character, the regex pattern
will match standalone word by taking into account the presence
of whitespace before and after it; if not, it will match any presence
of the word throughout the text, ignoring the whitespace.

Args:
    word (str): the word to be matched with regex pattern.

Returns:
    str: the regex pattern for the word.

Definition at line 248 of file webutils.py.

248def regex_highlight_cjk(word: str) -> str:
249 """Generate the regex pattern to match for a given word according
250 to whether or not the word contains CJK characters or not.
251 If the word is and/or contains CJK character, the regex pattern
252 will match standalone word by taking into account the presence
253 of whitespace before and after it; if not, it will match any presence
254 of the word throughout the text, ignoring the whitespace.
255
256 Args:
257 word (str): the word to be matched with regex pattern.
258
259 Returns:
260 str: the regex pattern for the word.
261 """
262 rword = re.escape(word)
263 if contains_cjko(rword):
264 return fr'({rword})'
265 return fr'\b({rword})(?!\w)'
266
267

References contains_cjko().

Referenced by highlight_content().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ searxng_l10n_timespan()

str searx.webutils.searxng_l10n_timespan ( datetime dt)
Returns a human-readable and translated string indicating how long ago
a date was in the past / the time span of the date to the present.

On January 1st, midnight, the returned string only indicates how many years
ago the date was.

Definition at line 289 of file webutils.py.

289def searxng_l10n_timespan(dt: datetime) -> str: # pylint: disable=invalid-name
290 """Returns a human-readable and translated string indicating how long ago
291 a date was in the past / the time span of the date to the present.
292
293 On January 1st, midnight, the returned string only indicates how many years
294 ago the date was.
295 """
296 # TODO, check if timezone is calculated right # pylint: disable=fixme
297 d = dt.date()
298 t = dt.time()
299 if d.month == 1 and d.day == 1 and t.hour == 0 and t.minute == 0 and t.second == 0:
300 return str(d.year)
301 if dt.replace(tzinfo=None) >= datetime.now() - timedelta(days=1):
302 timedifference = datetime.now() - dt.replace(tzinfo=None)
303 minutes = int((timedifference.seconds / 60) % 60)
304 hours = int(timedifference.seconds / 60 / 60)
305 if hours == 0:
306 return gettext('{minutes} minute(s) ago').format(minutes=minutes)
307 return gettext('{hours} hour(s), {minutes} minute(s) ago').format(hours=hours, minutes=minutes)
308 return format_date(dt)
309
310

◆ write_csv_response()

None searx.webutils.write_csv_response ( CSVWriter csv,
ResultContainer rc )
Write rows of the results to a query (``application/csv``) into a CSV
table (:py:obj:`CSVWriter`).  First line in the table contain the column
names.  The column "type" specifies the type, the following types are
included in the table:

- result
- answer
- suggestion
- correction

Definition at line 113 of file webutils.py.

113def write_csv_response(csv: CSVWriter, rc: ResultContainer) -> None: # pylint: disable=redefined-outer-name
114 """Write rows of the results to a query (``application/csv``) into a CSV
115 table (:py:obj:`CSVWriter`). First line in the table contain the column
116 names. The column "type" specifies the type, the following types are
117 included in the table:
118
119 - result
120 - answer
121 - suggestion
122 - correction
123
124 """
125
126 keys = ('title', 'url', 'content', 'host', 'engine', 'score', 'type')
127 csv.writerow(keys)
128
129 for res in rc.get_ordered_results():
130 row = res.as_dict()
131 row['host'] = row['parsed_url'].netloc
132 row['type'] = 'result'
133 csv.writerow([row.get(key, '') for key in keys])
134
135 for a in rc.answers:
136 row = a.as_dict()
137 row['host'] = row['parsed_url'].netloc
138 csv.writerow([row.get(key, '') for key in keys])
139
140 for a in rc.suggestions:
141 row = {'title': a, 'type': 'suggestion'}
142 csv.writerow([row.get(key, '') for key in keys])
143
144 for a in rc.corrections:
145 row = {'title': a, 'type': 'correction'}
146 csv.writerow([row.get(key, '') for key in keys])
147
148

Variable Documentation

◆ exception_classname_to_text

dict searx.webutils.exception_classname_to_text
Initial value:
1= {
2 None: gettext('unexpected crash'),
3 'timeout': timeout_text,
4 'asyncio.TimeoutError': timeout_text,
5 'httpx.TimeoutException': timeout_text,
6 'httpx.ConnectTimeout': timeout_text,
7 'httpx.ReadTimeout': timeout_text,
8 'httpx.WriteTimeout': timeout_text,
9 'httpx.HTTPStatusError': gettext('HTTP error'),
10 'httpx.ConnectError': gettext("HTTP connection error"),
11 'httpx.RemoteProtocolError': http_protocol_error_text,
12 'httpx.LocalProtocolError': http_protocol_error_text,
13 'httpx.ProtocolError': http_protocol_error_text,
14 'httpx.ReadError': network_error_text,
15 'httpx.WriteError': network_error_text,
16 'httpx.ProxyError': gettext("proxy error"),
17 'searx.exceptions.SearxEngineCaptchaException': gettext("CAPTCHA"),
18 'searx.exceptions.SearxEngineTooManyRequestsException': gettext("too many requests"),
19 'searx.exceptions.SearxEngineAccessDeniedException': gettext("access denied"),
20 'searx.exceptions.SearxEngineAPIException': gettext("server API error"),
21 'searx.exceptions.SearxEngineXPathException': parsing_error_text,
22 'KeyError': parsing_error_text,
23 'json.decoder.JSONDecodeError': parsing_error_text,
24 'lxml.etree.ParserError': parsing_error_text,
25 'ssl.SSLCertVerificationError': ssl_cert_error_text, # for Python > 3.7
26 'ssl.CertificateError': ssl_cert_error_text, # for Python 3.7
27}

Definition at line 41 of file webutils.py.

◆ http_protocol_error_text

searx.webutils.http_protocol_error_text = gettext('HTTP protocol error')

Definition at line 38 of file webutils.py.

◆ logger

searx.webutils.logger = logger.getChild('webutils')

Definition at line 34 of file webutils.py.

◆ network_error_text

searx.webutils.network_error_text = gettext('network error')

Definition at line 39 of file webutils.py.

◆ NO_SUBGROUPING

str searx.webutils.NO_SUBGROUPING = 'without further subgrouping'

Definition at line 311 of file webutils.py.

◆ parsing_error_text

searx.webutils.parsing_error_text = gettext('parsing error')

Definition at line 37 of file webutils.py.

◆ ssl_cert_error_text

searx.webutils.ssl_cert_error_text = gettext("SSL error: certificate validation has failed")

Definition at line 40 of file webutils.py.

◆ timeout_text

searx.webutils.timeout_text = gettext('timeout')

Definition at line 36 of file webutils.py.

◆ VALID_LANGUAGE_CODE

searx.webutils.VALID_LANGUAGE_CODE = re.compile(r'^[a-z]{2,3}(-[a-zA-Z]{2})?$')

Definition at line 32 of file webutils.py.