.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.engines.brave Namespace Reference

Functions

 request (query, params)
 
 _extract_published_date (published_date_raw)
 
 parse_data_string (resp)
 
EngineResults response (resp)
 
EngineResults _parse_search (resp)
 
EngineResults _parse_news (resp)
 
EngineResults _parse_images (json_resp)
 
EngineResults _parse_videos (json_resp)
 
 fetch_traits (EngineTraits engine_traits)
 

Variables

logging logger .Logger
 
dict about
 
str base_url = "https://search.brave.com/"
 
list categories = []
 
str brave_category = 'search'
 
 Goggles = Any
 
bool brave_spellcheck = False
 
bool send_accept_language_header = True
 
bool paging = False
 
int max_page = 10
 
bool safesearch = True
 
dict safesearch_map = {2: 'strict', 1: 'moderate', 0: 'off'}
 
bool time_range_support = False
 
dict time_range_map
 

Detailed Description

Brave supports the categories listed in :py:obj:`brave_category` (General,
news, videos, images).  The support of :py:obj:`paging` and :py:obj:`time range
<time_range_support>` is limited (see remarks).

Configured ``brave`` engines:

.. code:: yaml

  - name: brave
    engine: brave
    ...
    brave_category: search
    time_range_support: true
    paging: true

  - name: brave.images
    engine: brave
    ...
    brave_category: images

  - name: brave.videos
    engine: brave
    ...
    brave_category: videos

  - name: brave.news
    engine: brave
    ...
    brave_category: news

  - name: brave.goggles
    time_range_support: true
    paging: true
    ...
    brave_category: goggles


.. _brave regions:

Brave regions
=============

Brave uses two-digit tags for the regions like ``ca`` while SearXNG deals with
locales.  To get a mapping, all *officiat de-facto* languages of the Brave
region are mapped to regions in SearXNG (see :py:obj:`babel
<babel.languages.get_official_languages>`):

.. code:: python

    "regions": {
      ..
      "en-CA": "ca",
      "fr-CA": "ca",
      ..
     }


.. note::

   The language (aka region) support of Brave's index is limited to very basic
   languages.  The search results for languages like Chinese or Arabic are of
   low quality.


.. _brave googles:

Brave Goggles
=============

.. _list of Goggles: https://search.brave.com/goggles/discover
.. _Goggles Whitepaper: https://brave.com/static-assets/files/goggles.pdf
.. _Goggles Quickstart: https://github.com/brave/goggles-quickstart

Goggles allow you to choose, alter, or extend the ranking of Brave Search
results (`Goggles Whitepaper`_).  Goggles are openly developed by the community
of Brave Search users.

Select from the `list of Goggles`_ people have published, or create your own
(`Goggles Quickstart`_).


.. _brave languages:

Brave languages
===============

Brave's language support is limited to the UI (menus, area local notations,
etc).  Brave's index only seems to support a locale, but it does not seem to
support any languages in its index.  The choice of available languages is very
small (and its not clear to me where the difference in UI is when switching
from en-us to en-ca or en-gb).

In the :py:obj:`EngineTraits object <searx.enginelib.traits.EngineTraits>` the
UI languages are stored in a custom field named ``ui_lang``:

.. code:: python

    "custom": {
      "ui_lang": {
        "ca": "ca",
        "de-DE": "de-de",
        "en-CA": "en-ca",
        "en-GB": "en-gb",
        "en-US": "en-us",
        "es": "es",
        "fr-CA": "fr-ca",
        "fr-FR": "fr-fr",
        "ja-JP": "ja-jp",
        "pt-BR": "pt-br",
        "sq-AL": "sq-al"
      }
    },

Implementations
===============

Function Documentation

◆ _extract_published_date()

searx.engines.brave._extract_published_date ( published_date_raw)
protected

Definition at line 246 of file brave.py.

246def _extract_published_date(published_date_raw):
247 if published_date_raw is None:
248 return None
249
250 try:
251 return parser.parse(published_date_raw)
252 except parser.ParserError:
253 return None
254
255

Referenced by _parse_search(), and _parse_videos().

+ Here is the caller graph for this function:

◆ _parse_images()

EngineResults searx.engines.brave._parse_images ( json_resp)
protected

Definition at line 402 of file brave.py.

402def _parse_images(json_resp) -> EngineResults:
403 result_list = EngineResults()
404
405 for result in json_resp["results"]:
406 item = {
407 'url': result['url'],
408 'title': result['title'],
409 'content': result['description'],
410 'template': 'images.html',
411 'resolution': result['properties']['format'],
412 'source': result['source'],
413 'img_src': result['properties']['url'],
414 'thumbnail_src': result['thumbnail']['src'],
415 }
416 result_list.append(item)
417
418 return result_list
419
420

Referenced by response().

+ Here is the caller graph for this function:

◆ _parse_news()

EngineResults searx.engines.brave._parse_news ( resp)
protected

Definition at line 372 of file brave.py.

372def _parse_news(resp) -> EngineResults:
373
374 result_list = EngineResults()
375 dom = html.fromstring(resp.text)
376
377 for result in eval_xpath_list(dom, '//div[contains(@class, "results")]//div[@data-type="news"]'):
378
379 # import pdb
380 # pdb.set_trace()
381
382 url = eval_xpath_getindex(result, './/a[contains(@class, "result-header")]/@href', 0, default=None)
383 if url is None:
384 continue
385
386 title = extract_text(eval_xpath_list(result, './/span[contains(@class, "snippet-title")]'))
387 content = extract_text(eval_xpath_list(result, './/p[contains(@class, "desc")]'))
388 thumbnail = eval_xpath_getindex(result, './/div[contains(@class, "image-wrapper")]//img/@src', 0, default='')
389
390 item = {
391 "url": url,
392 "title": title,
393 "content": content,
394 "thumbnail": thumbnail,
395 }
396
397 result_list.append(item)
398
399 return result_list
400
401

Referenced by response().

+ Here is the caller graph for this function:

◆ _parse_search()

EngineResults searx.engines.brave._parse_search ( resp)
protected

Definition at line 303 of file brave.py.

303def _parse_search(resp) -> EngineResults:
304 result_list = EngineResults()
305
306 dom = html.fromstring(resp.text)
307
308 # I doubt that Brave is still providing the "answer" class / I haven't seen
309 # answers in brave for a long time.
310 answer_tag = eval_xpath_getindex(dom, '//div[@class="answer"]', 0, default=None)
311 if answer_tag:
312 url = eval_xpath_getindex(dom, '//div[@id="featured_snippet"]/a[@class="result-header"]/@href', 0, default=None)
313 answer = extract_text(answer_tag)
314 if answer is not None:
315 result_list.add(result_list.types.Answer(answer=answer, url=url))
316
317 # xpath_results = '//div[contains(@class, "snippet fdb") and @data-type="web"]'
318 xpath_results = '//div[contains(@class, "snippet ")]'
319
320 for result in eval_xpath_list(dom, xpath_results):
321
322 url = eval_xpath_getindex(result, './/a[contains(@class, "h")]/@href', 0, default=None)
323 title_tag = eval_xpath_getindex(
324 result, './/a[contains(@class, "h")]//div[contains(@class, "title")]', 0, default=None
325 )
326 if url is None or title_tag is None or not urlparse(url).netloc: # partial url likely means it's an ad
327 continue
328
329 content: str = extract_text(
330 eval_xpath_getindex(result, './/div[contains(@class, "snippet-description")]', 0, default='')
331 ) # type: ignore
332 pub_date_raw = eval_xpath(result, 'substring-before(.//div[contains(@class, "snippet-description")], "-")')
333 pub_date = _extract_published_date(pub_date_raw)
334 if pub_date and content.startswith(pub_date_raw):
335 content = content.lstrip(pub_date_raw).strip("- \n\t")
336
337 thumbnail = eval_xpath_getindex(result, './/img[contains(@class, "thumb")]/@src', 0, default='')
338
339 item = {
340 'url': url,
341 'title': extract_text(title_tag),
342 'content': content,
343 'publishedDate': pub_date,
344 'thumbnail': thumbnail,
345 }
346
347 video_tag = eval_xpath_getindex(
348 result, './/div[contains(@class, "video-snippet") and @data-macro="video"]', 0, default=None
349 )
350 if video_tag is not None:
351
352 # In my tests a video tag in the WEB search was most often not a
353 # video, except the ones from youtube ..
354
355 iframe_src = get_embeded_stream_url(url)
356 if iframe_src:
357 item['iframe_src'] = iframe_src
358 item['template'] = 'videos.html'
359 item['thumbnail'] = eval_xpath_getindex(video_tag, './/img/@src', 0, default='')
360 pub_date_raw = extract_text(
361 eval_xpath(video_tag, './/div[contains(@class, "snippet-attributes")]/div/text()')
362 )
363 item['publishedDate'] = _extract_published_date(pub_date_raw)
364 else:
365 item['thumbnail'] = eval_xpath_getindex(video_tag, './/img/@src', 0, default='')
366
367 result_list.append(item)
368
369 return result_list
370
371

References _extract_published_date().

Referenced by response().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _parse_videos()

EngineResults searx.engines.brave._parse_videos ( json_resp)
protected

Definition at line 421 of file brave.py.

421def _parse_videos(json_resp) -> EngineResults:
422 result_list = EngineResults()
423
424 for result in json_resp["results"]:
425
426 url = result['url']
427 item = {
428 'url': url,
429 'title': result['title'],
430 'content': result['description'],
431 'template': 'videos.html',
432 'length': result['video']['duration'],
433 'duration': result['video']['duration'],
434 'publishedDate': _extract_published_date(result['age']),
435 }
436
437 if result['thumbnail'] is not None:
438 item['thumbnail'] = result['thumbnail']['src']
439
440 iframe_src = get_embeded_stream_url(url)
441 if iframe_src:
442 item['iframe_src'] = iframe_src
443
444 result_list.append(item)
445
446 return result_list
447
448

References _extract_published_date().

Referenced by response().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ fetch_traits()

searx.engines.brave.fetch_traits ( EngineTraits engine_traits)
Fetch :ref:`languages <brave languages>` and :ref:`regions <brave
regions>` from Brave.

Definition at line 449 of file brave.py.

449def fetch_traits(engine_traits: EngineTraits):
450 """Fetch :ref:`languages <brave languages>` and :ref:`regions <brave
451 regions>` from Brave."""
452
453 # pylint: disable=import-outside-toplevel, too-many-branches
454
455 import babel.languages
456 from searx.locales import region_tag, language_tag
457 from searx.network import get # see https://github.com/searxng/searxng/issues/762
458
459 engine_traits.custom["ui_lang"] = {}
460
461 headers = {
462 'Accept-Encoding': 'gzip, deflate',
463 }
464 lang_map = {'no': 'nb'} # norway
465
466 # languages (UI)
467
468 resp = get('https://search.brave.com/settings', headers=headers)
469
470 if not resp.ok: # type: ignore
471 print("ERROR: response from Brave is not OK.")
472 dom = html.fromstring(resp.text) # type: ignore
473
474 for option in dom.xpath('//section//option[@value="en-us"]/../option'):
475
476 ui_lang = option.get('value')
477 try:
478 l = babel.Locale.parse(ui_lang, sep='-')
479 if l.territory:
480 sxng_tag = region_tag(babel.Locale.parse(ui_lang, sep='-'))
481 else:
482 sxng_tag = language_tag(babel.Locale.parse(ui_lang, sep='-'))
483
484 except babel.UnknownLocaleError:
485 print("ERROR: can't determine babel locale of Brave's (UI) language %s" % ui_lang)
486 continue
487
488 conflict = engine_traits.custom["ui_lang"].get(sxng_tag)
489 if conflict:
490 if conflict != ui_lang:
491 print("CONFLICT: babel %s --> %s, %s" % (sxng_tag, conflict, ui_lang))
492 continue
493 engine_traits.custom["ui_lang"][sxng_tag] = ui_lang
494
495 # search regions of brave
496
497 resp = get('https://cdn.search.brave.com/serp/v2/_app/immutable/chunks/parameters.734c106a.js', headers=headers)
498
499 if not resp.ok: # type: ignore
500 print("ERROR: response from Brave is not OK.")
501
502 country_js = resp.text[resp.text.index("options:{all") + len('options:') :] # type: ignore
503 country_js = country_js[: country_js.index("},k={default")]
504 country_tags = js_variable_to_python(country_js)
505
506 for k, v in country_tags.items():
507 if k == 'all':
508 engine_traits.all_locale = 'all'
509 continue
510 country_tag = v['value']
511
512 # add official languages of the country ..
513 for lang_tag in babel.languages.get_official_languages(country_tag, de_facto=True):
514 lang_tag = lang_map.get(lang_tag, lang_tag)
515 sxng_tag = region_tag(babel.Locale.parse('%s_%s' % (lang_tag, country_tag.upper())))
516 # print("%-20s: %s <-- %s" % (v['label'], country_tag, sxng_tag))
517
518 conflict = engine_traits.regions.get(sxng_tag)
519 if conflict:
520 if conflict != country_tag:
521 print("CONFLICT: babel %s --> %s, %s" % (sxng_tag, conflict, country_tag))
522 continue
523 engine_traits.regions[sxng_tag] = country_tag

◆ parse_data_string()

searx.engines.brave.parse_data_string ( resp)

Definition at line 256 of file brave.py.

256def parse_data_string(resp):
257 # kit.start(app, element, {
258 # node_ids: [0, 19],
259 # data: [{"type":"data","data" .... ["q","goggles_id"],"route":1,"url":1}}]
260 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
261 kit_start = resp.text.index("kit.start(app,")
262 start = resp.text[kit_start:].index('data: [{"type":"data"')
263 start = kit_start + start + len('data: ')
264
265 lev = 0
266 end = start
267 inner = False
268 for c in resp.text[start:]:
269 if inner and lev == 0:
270 break
271 end += 1
272 if c == "[":
273 lev += 1
274 inner = True
275 continue
276 if c == "]":
277 lev -= 1
278
279 json_data = js_variable_to_python(resp.text[start:end])
280 return json_data
281
282

Referenced by response().

+ Here is the caller graph for this function:

◆ request()

searx.engines.brave.request ( query,
params )

Definition at line 202 of file brave.py.

202def request(query, params):
203
204 # Don't accept br encoding / see https://github.com/searxng/searxng/pull/1787
205 params['headers']['Accept-Encoding'] = 'gzip, deflate'
206
207 args = {
208 'q': query,
209 'source': 'web',
210 }
211 if brave_spellcheck:
212 args['spellcheck'] = '1'
213
214 if brave_category in ('search', 'goggles'):
215 if params.get('pageno', 1) - 1:
216 args['offset'] = params.get('pageno', 1) - 1
217 if time_range_map.get(params['time_range']):
218 args['tf'] = time_range_map.get(params['time_range'])
219
220 if brave_category == 'goggles':
221 args['goggles_id'] = Goggles
222
223 params["url"] = f"{base_url}{brave_category}?{urlencode(args)}"
224
225 # set properties in the cookies
226
227 params['cookies']['safesearch'] = safesearch_map.get(params['safesearch'], 'off')
228 # the useLocation is IP based, we use cookie 'country' for the region
229 params['cookies']['useLocation'] = '0'
230 params['cookies']['summarizer'] = '0'
231
232 engine_region = traits.get_region(params['searxng_locale'], 'all')
233 params['cookies']['country'] = engine_region.split('-')[-1].lower() # type: ignore
234
235 ui_lang = locales.get_engine_locale(params['searxng_locale'], traits.custom["ui_lang"], 'en-us')
236 params['cookies']['ui_lang'] = ui_lang
237
238 logger.debug("cookies %s", params['cookies'])
239
240 params['headers']['Sec-Fetch-Dest'] = "document"
241 params['headers']['Sec-Fetch-Mode'] = "navigate"
242 params['headers']['Sec-Fetch-Site'] = "same-origin"
243 params['headers']['Sec-Fetch-User'] = "?1"
244
245

◆ response()

EngineResults searx.engines.brave.response ( resp)

Definition at line 283 of file brave.py.

283def response(resp) -> EngineResults:
284
285 if brave_category in ('search', 'goggles'):
286 return _parse_search(resp)
287
288 if brave_category in ('news'):
289 return _parse_news(resp)
290
291 json_data = parse_data_string(resp)
292 # json_data is a list and at the second position (0,1) in this list we find the "response" data we need ..
293 json_resp = json_data[1]['data']['body']['response']
294
295 if brave_category == 'images':
296 return _parse_images(json_resp)
297 if brave_category == 'videos':
298 return _parse_videos(json_resp)
299
300 raise ValueError(f"Unsupported brave category: {brave_category}")
301
302

References _parse_images(), _parse_news(), _parse_search(), _parse_videos(), and parse_data_string().

+ Here is the call graph for this function:

Variable Documentation

◆ about

dict searx.engines.brave.about
Initial value:
1= {
2 "website": 'https://search.brave.com/',
3 "wikidata_id": 'Q22906900',
4 "official_api_documentation": None,
5 "use_official_api": False,
6 "require_api_key": False,
7 "results": 'HTML',
8}

Definition at line 149 of file brave.py.

◆ base_url

str searx.engines.brave.base_url = "https://search.brave.com/"

Definition at line 158 of file brave.py.

◆ brave_category

str searx.engines.brave.brave_category = 'search'

Definition at line 160 of file brave.py.

◆ brave_spellcheck

bool searx.engines.brave.brave_spellcheck = False

Definition at line 171 of file brave.py.

◆ categories

list searx.engines.brave.categories = []

Definition at line 159 of file brave.py.

◆ Goggles

searx.engines.brave.Goggles = Any

Definition at line 161 of file brave.py.

◆ logger

logging searx.engines.brave.logger .Logger

Definition at line 145 of file brave.py.

◆ max_page

int searx.engines.brave.max_page = 10

Definition at line 182 of file brave.py.

◆ paging

bool searx.engines.brave.paging = False

Definition at line 179 of file brave.py.

◆ safesearch

bool searx.engines.brave.safesearch = True

Definition at line 187 of file brave.py.

◆ safesearch_map

dict searx.engines.brave.safesearch_map = {2: 'strict', 1: 'moderate', 0: 'off'}

Definition at line 188 of file brave.py.

◆ send_accept_language_header

bool searx.engines.brave.send_accept_language_header = True

Definition at line 178 of file brave.py.

◆ time_range_map

dict searx.engines.brave.time_range_map
Initial value:
1= {
2 'day': 'pd',
3 'week': 'pw',
4 'month': 'pm',
5 'year': 'py',
6}

Definition at line 194 of file brave.py.

◆ time_range_support

bool searx.engines.brave.time_range_support = False

Definition at line 190 of file brave.py.