.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.engines.brave Namespace Reference

Functions

None request (str query, dict[str, t.Any] params)
 _extract_published_date (published_date_raw)
EngineResults response (SXNG_Response resp)
EngineResults _parse_search (resp)
EngineResults _parse_news (resp)
EngineResults _parse_images (json_resp)
EngineResults _parse_videos (json_resp)
 fetch_traits (EngineTraits engine_traits)

Variables

dict about
str base_url = "https://search.brave.com/"
list categories = []
brave_category = 'search'
str Goggles = ""
bool brave_spellcheck = False
bool send_accept_language_header = True
bool paging = False
int max_page = 10
bool safesearch = True
dict safesearch_map = {2: 'strict', 1: 'moderate', 0: 'off'}
bool time_range_support = False
dict time_range_map

Detailed Description

Brave supports the categories listed in :py:obj:`brave_category` (General,
news, videos, images).  The support of :py:obj:`paging` and :py:obj:`time range
<time_range_support>` is limited (see remarks).

Configured ``brave`` engines:

.. code:: yaml

  - name: brave
    engine: brave
    ...
    brave_category: search
    time_range_support: true
    paging: true

  - name: brave.images
    engine: brave
    ...
    brave_category: images

  - name: brave.videos
    engine: brave
    ...
    brave_category: videos

  - name: brave.news
    engine: brave
    ...
    brave_category: news

  - name: brave.goggles
    time_range_support: true
    paging: true
    ...
    brave_category: goggles


.. _brave regions:

Brave regions
=============

Brave uses two-digit tags for the regions like ``ca`` while SearXNG deals with
locales.  To get a mapping, all *officiat de-facto* languages of the Brave
region are mapped to regions in SearXNG (see :py:obj:`babel
<babel.languages.get_official_languages>`):

.. code:: python

    "regions": {
      ..
      "en-CA": "ca",
      "fr-CA": "ca",
      ..
     }


.. note::

   The language (aka region) support of Brave's index is limited to very basic
   languages.  The search results for languages like Chinese or Arabic are of
   low quality.


.. _brave googles:

Brave Goggles
=============

.. _list of Goggles: https://search.brave.com/goggles/discover
.. _Goggles Whitepaper: https://brave.com/static-assets/files/goggles.pdf
.. _Goggles Quickstart: https://github.com/brave/goggles-quickstart

Goggles allow you to choose, alter, or extend the ranking of Brave Search
results (`Goggles Whitepaper`_).  Goggles are openly developed by the community
of Brave Search users.

Select from the `list of Goggles`_ people have published, or create your own
(`Goggles Quickstart`_).


.. _brave languages:

Brave languages
===============

Brave's language support is limited to the UI (menus, area local notations,
etc).  Brave's index only seems to support a locale, but it does not seem to
support any languages in its index.  The choice of available languages is very
small (and its not clear to me where the difference in UI is when switching
from en-us to en-ca or en-gb).

In the :py:obj:`EngineTraits object <searx.enginelib.traits.EngineTraits>` the
UI languages are stored in a custom field named ``ui_lang``:

.. code:: python

    "custom": {
      "ui_lang": {
        "ca": "ca",
        "de-DE": "de-de",
        "en-CA": "en-ca",
        "en-GB": "en-gb",
        "en-US": "en-us",
        "es": "es",
        "fr-CA": "fr-ca",
        "fr-FR": "fr-fr",
        "ja-JP": "ja-jp",
        "pt-BR": "pt-br",
        "sq-AL": "sq-al"
      }
    },

Implementations
===============

Function Documentation

◆ _extract_published_date()

searx.engines.brave._extract_published_date ( published_date_raw)
protected

Definition at line 243 of file brave.py.

243def _extract_published_date(published_date_raw):
244 if published_date_raw is None:
245 return None
246
247 try:
248 return parser.parse(published_date_raw)
249 except parser.ParserError:
250 return None
251
252

Referenced by _parse_search(), and _parse_videos().

Here is the caller graph for this function:

◆ _parse_images()

EngineResults searx.engines.brave._parse_images ( json_resp)
protected

Definition at line 380 of file brave.py.

380def _parse_images(json_resp) -> EngineResults:
381 result_list = EngineResults()
382
383 for result in json_resp["results"]:
384 item = {
385 'url': result['url'],
386 'title': result['title'],
387 'content': result['description'],
388 'template': 'images.html',
389 'resolution': result['properties']['format'],
390 'source': result['source'],
391 'img_src': result['properties']['url'],
392 'thumbnail_src': result['thumbnail']['src'],
393 }
394 result_list.append(item)
395
396 return result_list
397
398

Referenced by response().

Here is the caller graph for this function:

◆ _parse_news()

EngineResults searx.engines.brave._parse_news ( resp)
protected

Definition at line 350 of file brave.py.

350def _parse_news(resp) -> EngineResults:
351
352 result_list = EngineResults()
353 dom = html.fromstring(resp.text)
354
355 for result in eval_xpath_list(dom, '//div[contains(@class, "results")]//div[@data-type="news"]'):
356
357 # import pdb
358 # pdb.set_trace()
359
360 url = eval_xpath_getindex(result, './/a[contains(@class, "result-header")]/@href', 0, default=None)
361 if url is None:
362 continue
363
364 title = extract_text(eval_xpath_list(result, './/span[contains(@class, "snippet-title")]'))
365 content = extract_text(eval_xpath_list(result, './/p[contains(@class, "desc")]'))
366 thumbnail = eval_xpath_getindex(result, './/div[contains(@class, "image-wrapper")]//img/@src', 0, default='')
367
368 item = {
369 "url": url,
370 "title": title,
371 "content": content,
372 "thumbnail": thumbnail,
373 }
374
375 result_list.append(item)
376
377 return result_list
378
379

Referenced by response().

Here is the caller graph for this function:

◆ _parse_search()

EngineResults searx.engines.brave._parse_search ( resp)
protected

Definition at line 281 of file brave.py.

281def _parse_search(resp) -> EngineResults:
282 result_list = EngineResults()
283
284 dom = html.fromstring(resp.text)
285
286 # I doubt that Brave is still providing the "answer" class / I haven't seen
287 # answers in brave for a long time.
288 answer_tag = eval_xpath_getindex(dom, '//div[@class="answer"]', 0, default=None)
289 if answer_tag:
290 url = eval_xpath_getindex(dom, '//div[@id="featured_snippet"]/a[@class="result-header"]/@href', 0, default=None)
291 answer = extract_text(answer_tag)
292 if answer is not None:
293 result_list.add(result_list.types.Answer(answer=answer, url=url))
294
295 # xpath_results = '//div[contains(@class, "snippet fdb") and @data-type="web"]'
296 xpath_results = '//div[contains(@class, "snippet ")]'
297
298 for result in eval_xpath_list(dom, xpath_results):
299
300 url = eval_xpath_getindex(result, './/a[contains(@class, "h")]/@href', 0, default=None)
301 title_tag = eval_xpath_getindex(
302 result, './/a[contains(@class, "h")]//div[contains(@class, "title")]', 0, default=None
303 )
304 if url is None or title_tag is None or not urlparse(url).netloc: # partial url likely means it's an ad
305 continue
306
307 content: str = extract_text(
308 eval_xpath_getindex(result, './/div[contains(@class, "snippet-description")]', 0, default='')
309 ) # type: ignore
310 pub_date_raw = eval_xpath(result, 'substring-before(.//div[contains(@class, "snippet-description")], "-")')
311 pub_date = _extract_published_date(pub_date_raw)
312 if pub_date and content.startswith(pub_date_raw):
313 content = content.lstrip(pub_date_raw).strip("- \n\t")
314
315 thumbnail = eval_xpath_getindex(result, './/img[contains(@class, "thumb")]/@src', 0, default='')
316
317 item = {
318 'url': url,
319 'title': extract_text(title_tag),
320 'content': content,
321 'publishedDate': pub_date,
322 'thumbnail': thumbnail,
323 }
324
325 video_tag = eval_xpath_getindex(
326 result, './/div[contains(@class, "video-snippet") and @data-macro="video"]', 0, default=None
327 )
328 if video_tag is not None:
329
330 # In my tests a video tag in the WEB search was most often not a
331 # video, except the ones from youtube ..
332
333 iframe_src = get_embeded_stream_url(url)
334 if iframe_src:
335 item['iframe_src'] = iframe_src
336 item['template'] = 'videos.html'
337 item['thumbnail'] = eval_xpath_getindex(video_tag, './/img/@src', 0, default='')
338 pub_date_raw = extract_text(
339 eval_xpath(video_tag, './/div[contains(@class, "snippet-attributes")]/div/text()')
340 )
341 item['publishedDate'] = _extract_published_date(pub_date_raw)
342 else:
343 item['thumbnail'] = eval_xpath_getindex(video_tag, './/img/@src', 0, default='')
344
345 result_list.append(item)
346
347 return result_list
348
349

References _extract_published_date().

Referenced by response().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ _parse_videos()

EngineResults searx.engines.brave._parse_videos ( json_resp)
protected

Definition at line 399 of file brave.py.

399def _parse_videos(json_resp) -> EngineResults:
400 result_list = EngineResults()
401
402 for result in json_resp["results"]:
403
404 url = result['url']
405 item = {
406 'url': url,
407 'title': result['title'],
408 'content': result['description'],
409 'template': 'videos.html',
410 'length': result['video']['duration'],
411 'duration': result['video']['duration'],
412 'publishedDate': _extract_published_date(result['age']),
413 }
414
415 if result['thumbnail'] is not None:
416 item['thumbnail'] = result['thumbnail']['src']
417
418 iframe_src = get_embeded_stream_url(url)
419 if iframe_src:
420 item['iframe_src'] = iframe_src
421
422 result_list.append(item)
423
424 return result_list
425
426

References _extract_published_date().

Referenced by response().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ fetch_traits()

searx.engines.brave.fetch_traits ( EngineTraits engine_traits)
Fetch :ref:`languages <brave languages>` and :ref:`regions <brave
regions>` from Brave.

Definition at line 427 of file brave.py.

427def fetch_traits(engine_traits: EngineTraits):
428 """Fetch :ref:`languages <brave languages>` and :ref:`regions <brave
429 regions>` from Brave."""
430
431 # pylint: disable=import-outside-toplevel, too-many-branches
432
433 import babel.languages
434 from searx.locales import region_tag, language_tag
435 from searx.network import get # see https://github.com/searxng/searxng/issues/762
436
437 engine_traits.custom["ui_lang"] = {}
438
439 headers = {
440 'Accept-Encoding': 'gzip, deflate',
441 }
442 lang_map = {'no': 'nb'} # norway
443
444 # languages (UI)
445
446 resp = get('https://search.brave.com/settings', headers=headers)
447
448 if not resp.ok: # type: ignore
449 print("ERROR: response from Brave is not OK.")
450 dom = html.fromstring(resp.text) # type: ignore
451
452 for option in dom.xpath('//section//option[@value="en-us"]/../option'):
453
454 ui_lang = option.get('value')
455 try:
456 l = babel.Locale.parse(ui_lang, sep='-')
457 if l.territory:
458 sxng_tag = region_tag(babel.Locale.parse(ui_lang, sep='-'))
459 else:
460 sxng_tag = language_tag(babel.Locale.parse(ui_lang, sep='-'))
461
462 except babel.UnknownLocaleError:
463 print("ERROR: can't determine babel locale of Brave's (UI) language %s" % ui_lang)
464 continue
465
466 conflict = engine_traits.custom["ui_lang"].get(sxng_tag)
467 if conflict:
468 if conflict != ui_lang:
469 print("CONFLICT: babel %s --> %s, %s" % (sxng_tag, conflict, ui_lang))
470 continue
471 engine_traits.custom["ui_lang"][sxng_tag] = ui_lang
472
473 # search regions of brave
474
475 resp = get('https://cdn.search.brave.com/serp/v2/_app/immutable/chunks/parameters.734c106a.js', headers=headers)
476
477 if not resp.ok: # type: ignore
478 print("ERROR: response from Brave is not OK.")
479
480 country_js = resp.text[resp.text.index("options:{all") + len('options:') :] # type: ignore
481 country_js = country_js[: country_js.index("},k={default")]
482 country_tags = js_variable_to_python(country_js)
483
484 for k, v in country_tags.items():
485 if k == 'all':
486 engine_traits.all_locale = 'all'
487 continue
488 country_tag = v['value']
489
490 # add official languages of the country ..
491 for lang_tag in babel.languages.get_official_languages(country_tag, de_facto=True):
492 lang_tag = lang_map.get(lang_tag, lang_tag)
493 sxng_tag = region_tag(babel.Locale.parse('%s_%s' % (lang_tag, country_tag.upper())))
494 # print("%-20s: %s <-- %s" % (v['label'], country_tag, sxng_tag))
495
496 conflict = engine_traits.regions.get(sxng_tag)
497 if conflict:
498 if conflict != country_tag:
499 print("CONFLICT: babel %s --> %s, %s" % (sxng_tag, conflict, country_tag))
500 continue
501 engine_traits.regions[sxng_tag] = country_tag

◆ request()

None searx.engines.brave.request ( str query,
dict[str, t.Any] params )

Definition at line 199 of file brave.py.

199def request(query: str, params: dict[str, t.Any]) -> None:
200
201 # Don't accept br encoding / see https://github.com/searxng/searxng/pull/1787
202 params['headers']['Accept-Encoding'] = 'gzip, deflate'
203
204 args: dict[str, t.Any] = {
205 'q': query,
206 'source': 'web',
207 }
208 if brave_spellcheck:
209 args['spellcheck'] = '1'
210
211 if brave_category in ('search', 'goggles'):
212 if params.get('pageno', 1) - 1:
213 args['offset'] = params.get('pageno', 1) - 1
214 if time_range_map.get(params['time_range']):
215 args['tf'] = time_range_map.get(params['time_range'])
216
217 if brave_category == 'goggles':
218 args['goggles_id'] = Goggles
219
220 params["url"] = f"{base_url}{brave_category}?{urlencode(args)}"
221
222 # set properties in the cookies
223
224 params['cookies']['safesearch'] = safesearch_map.get(params['safesearch'], 'off')
225 # the useLocation is IP based, we use cookie 'country' for the region
226 params['cookies']['useLocation'] = '0'
227 params['cookies']['summarizer'] = '0'
228
229 engine_region = traits.get_region(params['searxng_locale'], 'all')
230 params['cookies']['country'] = engine_region.split('-')[-1].lower() # type: ignore
231
232 ui_lang = locales.get_engine_locale(params['searxng_locale'], traits.custom["ui_lang"], 'en-us')
233 params['cookies']['ui_lang'] = ui_lang
234
235 logger.debug("cookies %s", params['cookies'])
236
237 params['headers']['Sec-Fetch-Dest'] = "document"
238 params['headers']['Sec-Fetch-Mode'] = "navigate"
239 params['headers']['Sec-Fetch-Site'] = "same-origin"
240 params['headers']['Sec-Fetch-User'] = "?1"
241
242

◆ response()

EngineResults searx.engines.brave.response ( SXNG_Response resp)

Definition at line 253 of file brave.py.

253def response(resp: SXNG_Response) -> EngineResults:
254
255 if brave_category in ('search', 'goggles'):
256 return _parse_search(resp)
257
258 if brave_category in ('news'):
259 return _parse_news(resp)
260
261 # Example script source containing the data:
262 #
263 # kit.start(app, element, {
264 # node_ids: [0, 19],
265 # data: [{type:"data",data: .... ["q","goggles_id"],route:1,url:1}}]
266 # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
267 js_object = "[{" + extr(resp.text, "data: [{", "}}],") + "}}]"
268 json_data = js_variable_to_python(js_object)
269
270 # json_data is a list and at the second position (0,1) in this list we find the "response" data we need ..
271 json_resp = json_data[1]['data']['body']['response']
272
273 if brave_category == 'images':
274 return _parse_images(json_resp)
275 if brave_category == 'videos':
276 return _parse_videos(json_resp)
277
278 raise ValueError(f"Unsupported brave category: {brave_category}")
279
280

References _parse_images(), _parse_news(), _parse_search(), and _parse_videos().

Here is the call graph for this function:

Variable Documentation

◆ about

dict searx.engines.brave.about
Initial value:
1= {
2 "website": 'https://search.brave.com/',
3 "wikidata_id": 'Q22906900',
4 "official_api_documentation": None,
5 "use_official_api": False,
6 "require_api_key": False,
7 "results": 'HTML',
8}

Definition at line 144 of file brave.py.

◆ base_url

str searx.engines.brave.base_url = "https://search.brave.com/"

Definition at line 153 of file brave.py.

◆ brave_category

t searx.engines.brave.brave_category = 'search'

Definition at line 155 of file brave.py.

◆ brave_spellcheck

bool searx.engines.brave.brave_spellcheck = False

Definition at line 168 of file brave.py.

◆ categories

list searx.engines.brave.categories = []

Definition at line 154 of file brave.py.

◆ Goggles

str searx.engines.brave.Goggles = ""

Definition at line 165 of file brave.py.

◆ max_page

int searx.engines.brave.max_page = 10

Definition at line 179 of file brave.py.

◆ paging

bool searx.engines.brave.paging = False

Definition at line 176 of file brave.py.

◆ safesearch

bool searx.engines.brave.safesearch = True

Definition at line 184 of file brave.py.

◆ safesearch_map

dict searx.engines.brave.safesearch_map = {2: 'strict', 1: 'moderate', 0: 'off'}

Definition at line 185 of file brave.py.

◆ send_accept_language_header

bool searx.engines.brave.send_accept_language_header = True

Definition at line 175 of file brave.py.

◆ time_range_map

dict searx.engines.brave.time_range_map
Initial value:
1= {
2 'day': 'pd',
3 'week': 'pw',
4 'month': 'pm',
5 'year': 'py',
6}

Definition at line 191 of file brave.py.

◆ time_range_support

bool searx.engines.brave.time_range_support = False

Definition at line 187 of file brave.py.