.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.engines.brave Namespace Reference

Functions

 request (query, params)
 
 _extract_published_date (published_date_raw)
 
 response (resp)
 
 _parse_search (resp)
 
 _get_iframe_src (url)
 
 _parse_news (json_resp)
 
 _parse_images (json_resp)
 
 _parse_videos (json_resp)
 
 fetch_traits (EngineTraits engine_traits)
 

Variables

logging logger .Logger
 
EngineTraits traits
 
dict about
 
str base_url = "https://search.brave.com/"
 
list categories = []
 
str brave_category = 'search'
 
 Goggles = Any
 
bool brave_spellcheck = False
 
bool send_accept_language_header = True
 
bool paging = False
 
int max_page = 10
 
bool safesearch = True
 
dict safesearch_map = {2: 'strict', 1: 'moderate', 0: 'off'}
 
bool time_range_support = False
 
dict time_range_map
 

Detailed Description

Brave supports the categories listed in :py:obj:`brave_category` (General,
news, videos, images).  The support of :py:obj:`paging` and :py:obj:`time range
<time_range_support>` is limited (see remarks).

Configured ``brave`` engines:

.. code:: yaml

  - name: brave
    engine: brave
    ...
    brave_category: search
    time_range_support: true
    paging: true

  - name: brave.images
    engine: brave
    ...
    brave_category: images

  - name: brave.videos
    engine: brave
    ...
    brave_category: videos

  - name: brave.news
    engine: brave
    ...
    brave_category: news

  - name: brave.goggles
    brave_category: goggles
    time_range_support: true
    paging: true
    ...
    brave_category: goggles


.. _brave regions:

Brave regions
=============

Brave uses two-digit tags for the regions like ``ca`` while SearXNG deals with
locales.  To get a mapping, all *officiat de-facto* languages of the Brave
region are mapped to regions in SearXNG (see :py:obj:`babel
<babel.languages.get_official_languages>`):

.. code:: python

    "regions": {
      ..
      "en-CA": "ca",
      "fr-CA": "ca",
      ..
     }


.. note::

   The language (aka region) support of Brave's index is limited to very basic
   languages.  The search results for languages like Chinese or Arabic are of
   low quality.


.. _brave googles:

Brave Goggles
=============

.. _list of Goggles: https://search.brave.com/goggles/discover
.. _Goggles Whitepaper: https://brave.com/static-assets/files/goggles.pdf
.. _Goggles Quickstart: https://github.com/brave/goggles-quickstart

Goggles allow you to choose, alter, or extend the ranking of Brave Search
results (`Goggles Whitepaper`_).  Goggles are openly developed by the community
of Brave Search users.

Select from the `list of Goggles`_ people have published, or create your own
(`Goggles Quickstart`_).


.. _brave languages:

Brave languages
===============

Brave's language support is limited to the UI (menus, area local notations,
etc).  Brave's index only seems to support a locale, but it does not seem to
support any languages in its index.  The choice of available languages is very
small (and its not clear to me where the difference in UI is when switching
from en-us to en-ca or en-gb).

In the :py:obj:`EngineTraits object <searx.enginelib.traits.EngineTraits>` the
UI languages are stored in a custom field named ``ui_lang``:

.. code:: python

    "custom": {
      "ui_lang": {
        "ca": "ca",
        "de-DE": "de-de",
        "en-CA": "en-ca",
        "en-GB": "en-gb",
        "en-US": "en-us",
        "es": "es",
        "fr-CA": "fr-ca",
        "fr-FR": "fr-fr",
        "ja-JP": "ja-jp",
        "pt-BR": "pt-br",
        "sq-AL": "sq-al"
      }
    },

Implementations
===============

Function Documentation

◆ _extract_published_date()

searx.engines.brave._extract_published_date ( published_date_raw)
protected

Definition at line 240 of file brave.py.

240def _extract_published_date(published_date_raw):
241 if published_date_raw is None:
242 return None
243
244 try:
245 return parser.parse(published_date_raw)
246 except parser.ParserError:
247 return None
248
249

Referenced by searx.engines.brave._parse_news(), searx.engines.brave._parse_search(), and searx.engines.brave._parse_videos().

+ Here is the caller graph for this function:

◆ _get_iframe_src()

searx.engines.brave._get_iframe_src ( url)
protected

Definition at line 334 of file brave.py.

334def _get_iframe_src(url):
335 parsed_url = urlparse(url)
336 if parsed_url.path == '/watch' and parsed_url.query:
337 video_id = parse_qs(parsed_url.query).get('v', []) # type: ignore
338 if video_id:
339 return 'https://www.youtube-nocookie.com/embed/' + video_id[0] # type: ignore
340 return None
341
342

Referenced by searx.engines.brave._parse_search(), and searx.engines.brave._parse_videos().

+ Here is the caller graph for this function:

◆ _parse_images()

searx.engines.brave._parse_images ( json_resp)
protected

Definition at line 360 of file brave.py.

360def _parse_images(json_resp):
361 result_list = []
362
363 for result in json_resp["results"]:
364 item = {
365 'url': result['url'],
366 'title': result['title'],
367 'content': result['description'],
368 'template': 'images.html',
369 'resolution': result['properties']['format'],
370 'source': result['source'],
371 'img_src': result['properties']['url'],
372 'thumbnail_src': result['thumbnail']['src'],
373 }
374 result_list.append(item)
375
376 return result_list
377
378

Referenced by searx.engines.brave.response().

+ Here is the caller graph for this function:

◆ _parse_news()

searx.engines.brave._parse_news ( json_resp)
protected

Definition at line 343 of file brave.py.

343def _parse_news(json_resp):
344 result_list = []
345
346 for result in json_resp["results"]:
347 item = {
348 'url': result['url'],
349 'title': result['title'],
350 'content': result['description'],
351 'publishedDate': _extract_published_date(result['age']),
352 }
353 if result['thumbnail'] is not None:
354 item['img_src'] = result['thumbnail']['src']
355 result_list.append(item)
356
357 return result_list
358
359

References searx.engines.brave._extract_published_date().

Referenced by searx.engines.brave.response().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _parse_search()

searx.engines.brave._parse_search ( resp)
protected

Definition at line 275 of file brave.py.

275def _parse_search(resp):
276
277 result_list = []
278 dom = html.fromstring(resp.text)
279
280 answer_tag = eval_xpath_getindex(dom, '//div[@class="answer"]', 0, default=None)
281 if answer_tag:
282 url = eval_xpath_getindex(dom, '//div[@id="featured_snippet"]/a[@class="result-header"]/@href', 0, default=None)
283 result_list.append({'answer': extract_text(answer_tag), 'url': url})
284
285 # xpath_results = '//div[contains(@class, "snippet fdb") and @data-type="web"]'
286 xpath_results = '//div[contains(@class, "snippet ")]'
287
288 for result in eval_xpath_list(dom, xpath_results):
289
290 url = eval_xpath_getindex(result, './/a[contains(@class, "h")]/@href', 0, default=None)
291 title_tag = eval_xpath_getindex(
292 result, './/a[contains(@class, "h")]//div[contains(@class, "title")]', 0, default=None
293 )
294 if url is None or title_tag is None or not urlparse(url).netloc: # partial url likely means it's an ad
295 continue
296
297 content_tag = eval_xpath_getindex(result, './/div[contains(@class, "snippet-description")]', 0, default='')
298 pub_date_raw = eval_xpath(result, 'substring-before(.//div[contains(@class, "snippet-description")], "-")')
299 img_src = eval_xpath_getindex(result, './/img[contains(@class, "thumb")]/@src', 0, default='')
300
301 item = {
302 'url': url,
303 'title': extract_text(title_tag),
304 'content': extract_text(content_tag),
305 'publishedDate': _extract_published_date(pub_date_raw),
306 'img_src': img_src,
307 }
308
309 video_tag = eval_xpath_getindex(
310 result, './/div[contains(@class, "video-snippet") and @data-macro="video"]', 0, default=None
311 )
312 if video_tag is not None:
313
314 # In my tests a video tag in the WEB search was most often not a
315 # video, except the ones from youtube ..
316
317 iframe_src = _get_iframe_src(url)
318 if iframe_src:
319 item['iframe_src'] = iframe_src
320 item['template'] = 'videos.html'
321 item['thumbnail'] = eval_xpath_getindex(video_tag, './/img/@src', 0, default='')
322 pub_date_raw = extract_text(
323 eval_xpath(video_tag, './/div[contains(@class, "snippet-attributes")]/div/text()')
324 )
325 item['publishedDate'] = _extract_published_date(pub_date_raw)
326 else:
327 item['img_src'] = eval_xpath_getindex(video_tag, './/img/@src', 0, default='')
328
329 result_list.append(item)
330
331 return result_list
332
333

References searx.engines.brave._extract_published_date(), and searx.engines.brave._get_iframe_src().

Referenced by searx.engines.brave.response().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ _parse_videos()

searx.engines.brave._parse_videos ( json_resp)
protected

Definition at line 379 of file brave.py.

379def _parse_videos(json_resp):
380 result_list = []
381
382 for result in json_resp["results"]:
383
384 url = result['url']
385 item = {
386 'url': url,
387 'title': result['title'],
388 'content': result['description'],
389 'template': 'videos.html',
390 'length': result['video']['duration'],
391 'duration': result['video']['duration'],
392 'publishedDate': _extract_published_date(result['age']),
393 }
394
395 if result['thumbnail'] is not None:
396 item['thumbnail'] = result['thumbnail']['src']
397
398 iframe_src = _get_iframe_src(url)
399 if iframe_src:
400 item['iframe_src'] = iframe_src
401
402 result_list.append(item)
403
404 return result_list
405
406

References searx.engines.brave._extract_published_date(), and searx.engines.brave._get_iframe_src().

Referenced by searx.engines.brave.response().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ fetch_traits()

searx.engines.brave.fetch_traits ( EngineTraits engine_traits)
Fetch :ref:`languages <brave languages>` and :ref:`regions <brave
regions>` from Brave.

Definition at line 407 of file brave.py.

407def fetch_traits(engine_traits: EngineTraits):
408 """Fetch :ref:`languages <brave languages>` and :ref:`regions <brave
409 regions>` from Brave."""
410
411 # pylint: disable=import-outside-toplevel, too-many-branches
412
413 import babel.languages
414 from searx.locales import region_tag, language_tag
415 from searx.network import get # see https://github.com/searxng/searxng/issues/762
416
417 engine_traits.custom["ui_lang"] = {}
418
419 headers = {
420 'Accept-Encoding': 'gzip, deflate',
421 }
422 lang_map = {'no': 'nb'} # norway
423
424 # languages (UI)
425
426 resp = get('https://search.brave.com/settings', headers=headers)
427
428 if not resp.ok: # type: ignore
429 print("ERROR: response from Brave is not OK.")
430 dom = html.fromstring(resp.text) # type: ignore
431
432 for option in dom.xpath('//div[@id="language-select"]//option'):
433
434 ui_lang = option.get('value')
435 try:
436 if '-' in ui_lang:
437 sxng_tag = region_tag(babel.Locale.parse(ui_lang, sep='-'))
438 else:
439 sxng_tag = language_tag(babel.Locale.parse(ui_lang))
440
441 except babel.UnknownLocaleError:
442 print("ERROR: can't determine babel locale of Brave's (UI) language %s" % ui_lang)
443 continue
444
445 conflict = engine_traits.custom["ui_lang"].get(sxng_tag)
446 if conflict:
447 if conflict != ui_lang:
448 print("CONFLICT: babel %s --> %s, %s" % (sxng_tag, conflict, ui_lang))
449 continue
450 engine_traits.custom["ui_lang"][sxng_tag] = ui_lang
451
452 # search regions of brave
453
454 resp = get('https://cdn.search.brave.com/serp/v2/_app/immutable/chunks/parameters.734c106a.js', headers=headers)
455
456 if not resp.ok: # type: ignore
457 print("ERROR: response from Brave is not OK.")
458
459 country_js = resp.text[resp.text.index("options:{all") + len('options:') :]
460 country_js = country_js[: country_js.index("},k={default")]
461 country_tags = js_variable_to_python(country_js)
462
463 for k, v in country_tags.items():
464 if k == 'all':
465 engine_traits.all_locale = 'all'
466 continue
467 country_tag = v['value']
468
469 # add official languages of the country ..
470 for lang_tag in babel.languages.get_official_languages(country_tag, de_facto=True):
471 lang_tag = lang_map.get(lang_tag, lang_tag)
472 sxng_tag = region_tag(babel.Locale.parse('%s_%s' % (lang_tag, country_tag.upper())))
473 # print("%-20s: %s <-- %s" % (v['label'], country_tag, sxng_tag))
474
475 conflict = engine_traits.regions.get(sxng_tag)
476 if conflict:
477 if conflict != country_tag:
478 print("CONFLICT: babel %s --> %s, %s" % (sxng_tag, conflict, country_tag))
479 continue
480 engine_traits.regions[sxng_tag] = country_tag

◆ request()

searx.engines.brave.request ( query,
params )

Definition at line 202 of file brave.py.

202def request(query, params):
203
204 # Don't accept br encoding / see https://github.com/searxng/searxng/pull/1787
205 params['headers']['Accept-Encoding'] = 'gzip, deflate'
206
207 args = {
208 'q': query,
209 }
210 if brave_spellcheck:
211 args['spellcheck'] = '1'
212
213 if brave_category in ('search', 'goggles'):
214 if params.get('pageno', 1) - 1:
215 args['offset'] = params.get('pageno', 1) - 1
216 if time_range_map.get(params['time_range']):
217 args['tf'] = time_range_map.get(params['time_range'])
218
219 if brave_category == 'goggles':
220 args['goggles_id'] = Goggles
221
222 params["url"] = f"{base_url}{brave_category}?{urlencode(args)}"
223
224 # set properties in the cookies
225
226 params['cookies']['safesearch'] = safesearch_map.get(params['safesearch'], 'off')
227 # the useLocation is IP based, we use cookie 'country' for the region
228 params['cookies']['useLocation'] = '0'
229 params['cookies']['summarizer'] = '0'
230
231 engine_region = traits.get_region(params['searxng_locale'], 'all')
232 params['cookies']['country'] = engine_region.split('-')[-1].lower() # type: ignore
233
234 ui_lang = locales.get_engine_locale(params['searxng_locale'], traits.custom["ui_lang"], 'en-us')
235 params['cookies']['ui_lang'] = ui_lang
236
237 logger.debug("cookies %s", params['cookies'])
238
239

◆ response()

searx.engines.brave.response ( resp)

Definition at line 250 of file brave.py.

250def response(resp):
251
252 if brave_category in ('search', 'goggles'):
253 return _parse_search(resp)
254
255 datastr = ""
256 for line in resp.text.split("\n"):
257 if "const data = " in line:
258 datastr = line.replace("const data = ", "").strip()[:-1]
259 break
260
261 json_data = js_variable_to_python(datastr)
262 json_resp = json_data[1]['data']['body']['response']
263
264 if brave_category == 'news':
265 return _parse_news(json_resp['news'])
266
267 if brave_category == 'images':
268 return _parse_images(json_resp)
269 if brave_category == 'videos':
270 return _parse_videos(json_resp)
271
272 raise ValueError(f"Unsupported brave category: {brave_category}")
273
274

References searx.engines.brave._parse_images(), searx.engines.brave._parse_news(), searx.engines.brave._parse_search(), and searx.engines.brave._parse_videos().

+ Here is the call graph for this function:

Variable Documentation

◆ about

dict searx.engines.brave.about
Initial value:
1= {
2 "website": 'https://search.brave.com/',
3 "wikidata_id": 'Q22906900',
4 "official_api_documentation": None,
5 "use_official_api": False,
6 "require_api_key": False,
7 "results": 'HTML',
8}

Definition at line 149 of file brave.py.

◆ base_url

str searx.engines.brave.base_url = "https://search.brave.com/"

Definition at line 158 of file brave.py.

◆ brave_category

str searx.engines.brave.brave_category = 'search'

Definition at line 160 of file brave.py.

◆ brave_spellcheck

bool searx.engines.brave.brave_spellcheck = False

Definition at line 171 of file brave.py.

◆ categories

list searx.engines.brave.categories = []

Definition at line 159 of file brave.py.

◆ Goggles

searx.engines.brave.Goggles = Any

Definition at line 161 of file brave.py.

◆ logger

logging searx.engines.brave.logger .Logger

Definition at line 145 of file brave.py.

◆ max_page

int searx.engines.brave.max_page = 10

Definition at line 182 of file brave.py.

◆ paging

bool searx.engines.brave.paging = False

Definition at line 179 of file brave.py.

◆ safesearch

bool searx.engines.brave.safesearch = True

Definition at line 187 of file brave.py.

◆ safesearch_map

dict searx.engines.brave.safesearch_map = {2: 'strict', 1: 'moderate', 0: 'off'}

Definition at line 188 of file brave.py.

◆ send_accept_language_header

bool searx.engines.brave.send_accept_language_header = True

Definition at line 178 of file brave.py.

◆ time_range_map

dict searx.engines.brave.time_range_map
Initial value:
1= {
2 'day': 'pd',
3 'week': 'pw',
4 'month': 'pm',
5 'year': 'py',
6}

Definition at line 194 of file brave.py.

◆ time_range_support

bool searx.engines.brave.time_range_support = False

Definition at line 190 of file brave.py.

◆ traits

EngineTraits searx.engines.brave.traits

Definition at line 147 of file brave.py.