.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.engines.google_news Namespace Reference

Functions

 request (query, params)
 response (resp)
 fetch_traits (EngineTraits engine_traits)

Variables

dict about
list categories = ['news']
bool paging = False
bool time_range_support = False
bool safesearch = True
list ceid_list
list _skip_values
dict _ceid_locale_map = {'NO:no': 'nb-NO'}

Detailed Description

This is the implementation of the Google News engine.

Google News has a different region handling compared to Google WEB.

- the ``ceid`` argument has to be set (:py:obj:`ceid_list`)
- the hl_ argument has to be set correctly (and different to Google WEB)
- the gl_ argument is mandatory

If one of this argument is not set correctly, the request is redirected to
CONSENT dialog::

  https://consent.google.com/m?continue=

The google news API ignores some parameters from the common :ref:`google API`:

- num_ : the number of search results is ignored / there is no paging all
  results for a query term are in the first response.
- save_ : is ignored / Google-News results are always *SafeSearch*

.. _hl: https://developers.google.com/custom-search/docs/xml_results#hlsp
.. _gl: https://developers.google.com/custom-search/docs/xml_results#glsp
.. _num: https://developers.google.com/custom-search/docs/xml_results#numsp
.. _save: https://developers.google.com/custom-search/docs/xml_results#safesp

Function Documentation

◆ fetch_traits()

searx.engines.google_news.fetch_traits ( EngineTraits engine_traits)

Definition at line 273 of file google_news.py.

273def fetch_traits(engine_traits: EngineTraits):
274 _fetch_traits(engine_traits, add_domains=False)
275
276 engine_traits.custom['ceid'] = {}
277
278 for ceid in ceid_list:
279 if ceid in _skip_values:
280 continue
281
282 region, lang = ceid.split(':')
283 x = lang.split('-')
284 if len(x) > 1:
285 if x[1] not in ['Hant', 'Hans']:
286 lang = x[0]
287
288 sxng_locale = _ceid_locale_map.get(ceid, lang + '-' + region)
289 try:
290 locale = babel.Locale.parse(sxng_locale, sep='-')
291 except babel.UnknownLocaleError:
292 print("ERROR: %s -> %s is unknown by babel" % (ceid, sxng_locale))
293 continue
294
295 engine_traits.custom['ceid'][locales.region_tag(locale)] = ceid

◆ request()

searx.engines.google_news.request ( query,
params )
Google-News search request

Definition at line 70 of file google_news.py.

70def request(query, params):
71 """Google-News search request"""
72
73 sxng_locale = params.get('searxng_locale', 'en-US')
74 ceid = locales.get_engine_locale(sxng_locale, traits.custom['ceid'], default='US:en')
75 google_info = get_google_info(params, traits)
76 google_info['subdomain'] = 'news.google.com' # google news has only one domain
77
78 ceid_region, ceid_lang = ceid.split(':')
79 ceid_lang, ceid_suffix = (
80 ceid_lang.split('-')
81 + [
82 None,
83 ]
84 )[:2]
85
86 google_info['params']['hl'] = ceid_lang
87
88 if ceid_suffix and ceid_suffix not in ['Hans', 'Hant']:
89
90 if ceid_region.lower() == ceid_lang:
91 google_info['params']['hl'] = ceid_lang + '-' + ceid_region
92 else:
93 google_info['params']['hl'] = ceid_lang + '-' + ceid_suffix
94
95 elif ceid_region.lower() != ceid_lang:
96
97 if ceid_region in ['AT', 'BE', 'CH', 'IL', 'SA', 'IN', 'BD', 'PT']:
98 google_info['params']['hl'] = ceid_lang
99 else:
100 google_info['params']['hl'] = ceid_lang + '-' + ceid_region
101
102 google_info['params']['lr'] = 'lang_' + ceid_lang.split('-')[0]
103 google_info['params']['gl'] = ceid_region
104
105 query_url = (
106 'https://'
107 + google_info['subdomain']
108 + "/search?"
109 + urlencode(
110 {
111 'q': query,
112 **google_info['params'],
113 }
114 )
115 # ceid includes a ':' character which must not be urlencoded
116 + ('&ceid=%s' % ceid)
117 )
118
119 params['url'] = query_url
120 params['cookies'] = google_info['cookies']
121 params['headers'].update(google_info['headers'])
122 return params
123
124

◆ response()

searx.engines.google_news.response ( resp)
Get response from google's search request

Definition at line 125 of file google_news.py.

125def response(resp):
126 """Get response from google's search request"""
127 results = []
128 detect_google_sorry(resp)
129
130 # convert the text to dom
131 dom = html.fromstring(resp.text)
132
133 for result in eval_xpath_list(dom, '//div[@class="xrnccd"]'):
134
135 # The first <a> tag in the <article> contains the link to the article
136 # The href attribute of the <a> tag is a google internal link, we have
137 # to decode
138
139 href = eval_xpath_getindex(result, './article/a/@href', 0)
140 href = href.split('?')[0]
141 href = href.split('/')[-1]
142 href = base64.urlsafe_b64decode(href + '====')
143 href = href[href.index(b'http') :].split(b'\xd2')[0]
144 href = href.decode()
145
146 title = extract_text(eval_xpath(result, './article/h3[1]'))
147
148 # The pub_date is mostly a string like 'yesterday', not a real
149 # timezone date or time. Therefore we can't use publishedDate.
150 pub_date = extract_text(eval_xpath(result, './article//time'))
151 pub_origin = extract_text(eval_xpath(result, './article//a[@data-n-tid]'))
152
153 content = ' / '.join([x for x in [pub_origin, pub_date] if x])
154
155 # The image URL is located in a preceding sibling <img> tag, e.g.:
156 # "https://lh3.googleusercontent.com/DjhQh7DMszk.....z=-p-h100-w100"
157 # These URL are long but not personalized (double checked via tor).
158
159 thumbnail = extract_text(result.xpath('preceding-sibling::a/figure/img/@src'))
160
161 results.append(
162 {
163 'url': href,
164 'title': title,
165 'content': content,
166 'thumbnail': thumbnail,
167 }
168 )
169
170 # return results
171 return results
172
173

Variable Documentation

◆ _ceid_locale_map

dict searx.engines.google_news._ceid_locale_map = {'NO:no': 'nb-NO'}
protected

Definition at line 270 of file google_news.py.

◆ _skip_values

list searx.engines.google_news._skip_values
protected
Initial value:
1= [
2 'ET:en', # english (ethiopia)
3 'ID:en', # english (indonesia)
4 'LV:en', # english (latvia)
5]

Definition at line 264 of file google_news.py.

◆ about

dict searx.engines.google_news.about
Initial value:
1= {
2 "website": 'https://news.google.com',
3 "wikidata_id": 'Q12020',
4 "official_api_documentation": 'https://developers.google.com/custom-search',
5 "use_official_api": False,
6 "require_api_key": False,
7 "results": 'HTML',
8}

Definition at line 48 of file google_news.py.

◆ categories

list searx.engines.google_news.categories = ['news']

Definition at line 58 of file google_news.py.

◆ ceid_list

list searx.engines.google_news.ceid_list

Definition at line 174 of file google_news.py.

◆ paging

bool searx.engines.google_news.paging = False

Definition at line 59 of file google_news.py.

◆ safesearch

bool searx.engines.google_news.safesearch = True

Definition at line 66 of file google_news.py.

◆ time_range_support

bool searx.engines.google_news.time_range_support = False

Definition at line 60 of file google_news.py.