.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.engines.presearch Namespace Reference

Functions

 init (_)
 
 _get_request_id (query, params)
 
 request (query, params)
 
 _strip_leading_strings (text)
 
 _fix_title (title, url)
 
 parse_search_query (json_results)
 
 response (resp)
 

Variables

dict about
 
bool paging = True
 
bool safesearch = True
 
bool time_range_support = True
 
bool send_accept_language_header = True
 
list categories = ["general", "web"]
 
str search_type = "search"
 
str base_url = "https://presearch.com"
 
dict safesearch_map = {0: 'false', 1: 'true', 2: 'true'}
 

Detailed Description

Presearch supports the search types listed in :py:obj:`search_type` (general,
images, videos, news).

Configured ``presarch`` engines:

.. code:: yaml

  - name: presearch
    engine: presearch
    search_type: search
    categories: [general, web]

  - name: presearch images
    ...
    search_type: images
    categories: [images, web]

  - name: presearch videos
    ...
    search_type: videos
    categories: [general, web]

  - name: presearch news
    ...
    search_type: news
    categories: [news, web]

.. hint::

   By default Presearch's video category is intentionally placed into::

       categories: [general, web]


Search type ``video``
=====================

The results in the video category are most often links to pages that contain a
video, for instance many links from Preasearch's video category link content
from facebook (aka Meta) or Twitter (aka X).  Since these are not real links to
video streams SearXNG can't use the video template for this and if SearXNG can't
use this template, then the user doesn't want to see these hits in the videos
category.


Languages & Regions
===================

In Presearch there are languages for the UI and regions for narrowing down the
search.  If we set "auto" for the region in the WEB-UI of Presearch and cookie
``use_local_search_results=false``, then the defaults are set for both (the
language and the region) from the ``Accept-Language`` header.

Since the region is already "auto" by default, we only need to set the
``use_local_search_results`` cookie and send the ``Accept-Language`` header.  We
have to set these values in both requests we send to Presearch; in the first
request to get the request-ID from Presearch and in the final request to get the
result list (see ``send_accept_language_header``).

The time format returned by Presearch varies depending on the language set.
Multiple different formats can be supported by using ``dateutil`` parser, but
it doesn't support formats such as "N time ago", "vor N time" (German),
"Hace N time" (Spanish). Because of this, the dates are simply joined together
with the rest of other metadata.


Implementations
===============

Function Documentation

◆ _fix_title()

searx.engines.presearch._fix_title ( title,
url )
protected
Titles from Presearch shows domain + title without spacing, and HTML
This function removes these 2 issues.
Transforming "translate.google.co.in<em>Google</em> Translate" into "Google Translate"

Definition at line 164 of file presearch.py.

164def _fix_title(title, url):
165 """
166 Titles from Presearch shows domain + title without spacing, and HTML
167 This function removes these 2 issues.
168 Transforming "translate.google.co.in<em>Google</em> Translate" into "Google Translate"
169 """
170 parsed_url = urlparse(url)
171 domain = parsed_url.netloc
172 title = html_to_text(title)
173 # Fixes issue where domain would show up in the title
174 # translate.google.co.inGoogle Translate -> Google Translate
175 if (
176 title.startswith(domain)
177 and len(title) > len(domain)
178 and not title.startswith(domain + "/")
179 and not title.startswith(domain + " ")
180 ):
181 title = title.removeprefix(domain)
182 return title
183
184

Referenced by parse_search_query().

+ Here is the caller graph for this function:

◆ _get_request_id()

searx.engines.presearch._get_request_id ( query,
params )
protected

Definition at line 104 of file presearch.py.

104def _get_request_id(query, params):
105
106 args = {
107 "q": query,
108 "page": params["pageno"],
109 }
110
111 if params["time_range"]:
112 args["time"] = params["time_range"]
113
114 url = f"{base_url}/{search_type}?{urlencode(args)}"
115
116 headers = {
117 'User-Agent': gen_useragent(),
118 'Cookie': (
119 f"b=1;"
120 f" presearch_session=;"
121 f" use_local_search_results=false;"
122 f" use_safe_search={safesearch_map[params['safesearch']]}"
123 ),
124 }
125 if params['searxng_locale'] != 'all':
126 l = locales.get_locale(params['searxng_locale'])
127
128 # Presearch narrows down the search by region. In SearXNG when the user
129 # does not set a region (e.g. 'en-CA' / canada) we cannot hand over a region.
130
131 # We could possibly use searx.locales.get_official_locales to determine
132 # in which regions this language is an official one, but then we still
133 # wouldn't know which region should be given more weight / Presearch
134 # performs an IP-based geolocation of the user, we don't want that in
135 # SearXNG ;-)
136
137 if l.territory:
138 headers['Accept-Language'] = f"{l.language}-{l.territory},{l.language};" "q=0.9,*;" "q=0.5"
139
140 resp_text = get(url, headers=headers).text # type: ignore
141
142 for line in resp_text.split("\n"):
143 if "window.searchId = " in line:
144 return line.split("= ")[1][:-1].replace('"', "")
145
146 return None
147
148

Referenced by request().

+ Here is the caller graph for this function:

◆ _strip_leading_strings()

searx.engines.presearch._strip_leading_strings ( text)
protected

Definition at line 157 of file presearch.py.

157def _strip_leading_strings(text):
158 for x in ['wikipedia', 'google']:
159 if text.lower().endswith(x):
160 text = text[: -len(x)]
161 return text.strip()
162
163

Referenced by parse_search_query().

+ Here is the caller graph for this function:

◆ init()

searx.engines.presearch.init ( _)

Definition at line 99 of file presearch.py.

99def init(_):
100 if search_type not in ['search', 'images', 'videos', 'news']:
101 raise ValueError(f'presearch search_type: {search_type}')
102
103

◆ parse_search_query()

searx.engines.presearch.parse_search_query ( json_results)

Definition at line 185 of file presearch.py.

185def parse_search_query(json_results):
186 results = []
187
188 for item in json_results.get('specialSections', {}).get('topStoriesCompact', {}).get('data', []):
189 result = {
190 'url': item['link'],
191 'title': _fix_title(item['title'], item['link']),
192 'thumbnail': item['image'],
193 'content': '',
194 'metadata': item.get('source'),
195 }
196 results.append(result)
197
198 for item in json_results.get('standardResults', []):
199 result = {
200 'url': item['link'],
201 'title': _fix_title(item['title'], item['link']),
202 'content': html_to_text(item['description']),
203 }
204 results.append(result)
205
206 info = json_results.get('infoSection', {}).get('data')
207 if info:
208 attributes = []
209 for item in info.get('about', []):
210
211 text = html_to_text(item)
212 if ':' in text:
213 # split text into key / value
214 label, value = text.split(':', 1)
215 else:
216 # In other languages (tested with zh-TW) a colon is represented
217 # by a different symbol --> then we split at the first space.
218 label, value = text.split(' ', 1)
219 label = label[:-1]
220
221 value = _strip_leading_strings(value)
222 attributes.append({'label': label, 'value': value})
223 content = []
224 for item in [info.get('subtitle'), info.get('description')]:
225 if not item:
226 continue
227 item = _strip_leading_strings(html_to_text(item))
228 if item:
229 content.append(item)
230
231 results.append(
232 {
233 'infobox': info['title'],
234 'id': info['title'],
235 'img_src': info.get('image'),
236 'content': ' | '.join(content),
237 'attributes': attributes,
238 }
239 )
240 return results
241
242

References _fix_title(), and _strip_leading_strings().

Referenced by response().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ request()

searx.engines.presearch.request ( query,
params )

Definition at line 149 of file presearch.py.

149def request(query, params):
150 request_id = _get_request_id(query, params)
151 params["headers"]["Accept"] = "application/json"
152 params["url"] = f"{base_url}/results?id={request_id}"
153
154 return params
155
156

References _get_request_id().

+ Here is the call graph for this function:

◆ response()

searx.engines.presearch.response ( resp)

Definition at line 243 of file presearch.py.

243def response(resp):
244 results = []
245 json_resp = resp.json()
246
247 if search_type == 'search':
248 results = parse_search_query(json_resp.get('results'))
249
250 elif search_type == 'images':
251 for item in json_resp.get('images', []):
252 results.append(
253 {
254 'template': 'images.html',
255 'title': html_to_text(item['title']),
256 'url': item.get('link'),
257 'img_src': item.get('image'),
258 'thumbnail_src': item.get('thumbnail'),
259 }
260 )
261
262 elif search_type == 'videos':
263 # The results in the video category are most often links to pages that contain
264 # a video and not to a video stream --> SearXNG can't use the video template.
265
266 for item in json_resp.get('videos', []):
267 duration = item.get('duration')
268 if duration:
269 duration = parse_duration_string(duration)
270
271 results.append(
272 {
273 'title': html_to_text(item['title']),
274 'url': item.get('link'),
275 'content': item.get('description', ''),
276 'thumbnail': item.get('image'),
277 'length': duration,
278 }
279 )
280
281 elif search_type == 'news':
282 for item in json_resp.get('news', []):
283 source = item.get('source')
284 # Bug on their end, time sometimes returns "</a>"
285 time = html_to_text(item.get('time')).strip()
286 metadata = [source]
287 if time != "":
288 metadata.append(time)
289
290 results.append(
291 {
292 'title': html_to_text(item['title']),
293 'url': item.get('link'),
294 'content': html_to_text(item.get('description', '')),
295 'metadata': ' / '.join(metadata),
296 'thumbnail': item.get('image'),
297 }
298 )
299
300 return results

References parse_search_query().

+ Here is the call graph for this function:

Variable Documentation

◆ about

dict searx.engines.presearch.about
Initial value:
1= {
2 "website": "https://presearch.io",
3 "wikidiata_id": "Q7240905",
4 "official_api_documentation": "https://docs.presearch.io/nodes/api",
5 "use_official_api": False,
6 "require_api_key": False,
7 "results": "JSON",
8}

Definition at line 78 of file presearch.py.

◆ base_url

str searx.engines.presearch.base_url = "https://presearch.com"

Definition at line 95 of file presearch.py.

◆ categories

list searx.engines.presearch.categories = ["general", "web"]

Definition at line 90 of file presearch.py.

◆ paging

bool searx.engines.presearch.paging = True

Definition at line 86 of file presearch.py.

◆ safesearch

bool searx.engines.presearch.safesearch = True

Definition at line 87 of file presearch.py.

◆ safesearch_map

dict searx.engines.presearch.safesearch_map = {0: 'false', 1: 'true', 2: 'true'}

Definition at line 96 of file presearch.py.

◆ search_type

str searx.engines.presearch.search_type = "search"

Definition at line 92 of file presearch.py.

◆ send_accept_language_header

bool searx.engines.presearch.send_accept_language_header = True

Definition at line 89 of file presearch.py.

◆ time_range_support

bool searx.engines.presearch.time_range_support = True

Definition at line 88 of file presearch.py.