.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.engines.presearch Namespace Reference

Functions

 init (_)
 
 _get_request_id (query, params)
 
 request (query, params)
 
 _strip_leading_strings (text)
 
 parse_search_query (json_results)
 
 response (resp)
 

Variables

dict about
 
bool paging = True
 
bool safesearch = True
 
bool time_range_support = True
 
bool send_accept_language_header = True
 
list categories = ["general", "web"]
 
str search_type = "search"
 
str base_url = "https://presearch.com"
 
dict safesearch_map = {0: 'false', 1: 'true', 2: 'true'}
 

Detailed Description

Presearch supports the search types listed in :py:obj:`search_type` (general,
images, videos, news).

Configured ``presarch`` engines:

.. code:: yaml

  - name: presearch
    engine: presearch
    search_type: search
    categories: [general, web]

  - name: presearch images
    ...
    search_type: images
    categories: [images, web]

  - name: presearch videos
    ...
    search_type: videos
    categories: [general, web]

  - name: presearch news
    ...
    search_type: news
    categories: [news, web]

.. hint::

   By default Presearch's video category is intentionally placed into::

       categories: [general, web]


Search type ``video``
=====================

The results in the video category are most often links to pages that contain a
video, for instance many links from Preasearch's video category link content
from facebook (aka Meta) or Twitter (aka X).  Since these are not real links to
video streams SearXNG can't use the video template for this and if SearXNG can't
use this template, then the user doesn't want to see these hits in the videos
category.


Languages & Regions
===================

In Presearch there are languages for the UI and regions for narrowing down the
search.  If we set "auto" for the region in the WEB-UI of Presearch and cookie
``use_local_search_results=false``, then the defaults are set for both (the
language and the region) from the ``Accept-Language`` header.

Since the region is already "auto" by default, we only need to set the
``use_local_search_results`` cookie and send the ``Accept-Language`` header.  We
have to set these values in both requests we send to Presearch; in the first
request to get the request-ID from Presearch and in the final request to get the
result list (see ``send_accept_language_header``).


Implementations
===============

Function Documentation

◆ _get_request_id()

searx.engines.presearch._get_request_id ( query,
params )
protected

Definition at line 98 of file presearch.py.

98def _get_request_id(query, params):
99
100 args = {
101 "q": query,
102 "page": params["pageno"],
103 }
104
105 if params["time_range"]:
106 args["time"] = params["time_range"]
107
108 url = f"{base_url}/{search_type}?{urlencode(args)}"
109
110 headers = {
111 'User-Agent': gen_useragent(),
112 'Cookie': (
113 f"b=1;"
114 f" presearch_session=;"
115 f" use_local_search_results=false;"
116 f" use_safe_search={safesearch_map[params['safesearch']]}"
117 ),
118 }
119 if params['searxng_locale'] != 'all':
120 l = locales.get_locale(params['searxng_locale'])
121
122 # Presearch narrows down the search by region. In SearXNG when the user
123 # does not set a region (e.g. 'en-CA' / canada) we cannot hand over a region.
124
125 # We could possibly use searx.locales.get_official_locales to determine
126 # in which regions this language is an official one, but then we still
127 # wouldn't know which region should be given more weight / Presearch
128 # performs an IP-based geolocation of the user, we don't want that in
129 # SearXNG ;-)
130
131 if l.territory:
132 headers['Accept-Language'] = f"{l.language}-{l.territory},{l.language};" "q=0.9,*;" "q=0.5"
133
134 resp_text = get(url, headers=headers).text # type: ignore
135
136 for line in resp_text.split("\n"):
137 if "window.searchId = " in line:
138 return line.split("= ")[1][:-1].replace('"', "")
139
140 return None
141
142

Referenced by searx.engines.presearch.request().

+ Here is the caller graph for this function:

◆ _strip_leading_strings()

searx.engines.presearch._strip_leading_strings ( text)
protected

Definition at line 151 of file presearch.py.

151def _strip_leading_strings(text):
152 for x in ['wikipedia', 'google']:
153 if text.lower().endswith(x):
154 text = text[: -len(x)]
155 return text.strip()
156
157

Referenced by searx.engines.presearch.parse_search_query().

+ Here is the caller graph for this function:

◆ init()

searx.engines.presearch.init ( _)

Definition at line 93 of file presearch.py.

93def init(_):
94 if search_type not in ['search', 'images', 'videos', 'news']:
95 raise ValueError(f'presearch search_type: {search_type}')
96
97

◆ parse_search_query()

searx.engines.presearch.parse_search_query ( json_results)

Definition at line 158 of file presearch.py.

158def parse_search_query(json_results):
159 results = []
160
161 for item in json_results.get('specialSections', {}).get('topStoriesCompact', {}).get('data', []):
162 result = {
163 'url': item['link'],
164 'title': item['title'],
165 'thumbnail': item['image'],
166 'content': '',
167 'metadata': item.get('source'),
168 }
169 results.append(result)
170
171 for item in json_results.get('standardResults', []):
172 result = {
173 'url': item['link'],
174 'title': item['title'],
175 'content': html_to_text(item['description']),
176 }
177 results.append(result)
178
179 info = json_results.get('infoSection', {}).get('data')
180 if info:
181 attributes = []
182 for item in info.get('about', []):
183
184 text = html_to_text(item)
185 if ':' in text:
186 # split text into key / value
187 label, value = text.split(':', 1)
188 else:
189 # In other languages (tested with zh-TW) a colon is represented
190 # by a different symbol --> then we split at the first space.
191 label, value = text.split(' ', 1)
192 label = label[:-1]
193
194 value = _strip_leading_strings(value)
195 attributes.append({'label': label, 'value': value})
196 content = []
197 for item in [info.get('subtitle'), info.get('description')]:
198 if not item:
199 continue
200 item = _strip_leading_strings(html_to_text(item))
201 if item:
202 content.append(item)
203
204 results.append(
205 {
206 'infobox': info['title'],
207 'id': info['title'],
208 'img_src': info.get('image'),
209 'content': ' | '.join(content),
210 'attributes': attributes,
211 }
212 )
213 return results
214
215

References searx.engines.presearch._strip_leading_strings().

Referenced by searx.engines.presearch.response().

+ Here is the call graph for this function:
+ Here is the caller graph for this function:

◆ request()

searx.engines.presearch.request ( query,
params )

Definition at line 143 of file presearch.py.

143def request(query, params):
144 request_id = _get_request_id(query, params)
145 params["headers"]["Accept"] = "application/json"
146 params["url"] = f"{base_url}/results?id={request_id}"
147
148 return params
149
150

References searx.engines.presearch._get_request_id().

+ Here is the call graph for this function:

◆ response()

searx.engines.presearch.response ( resp)

Definition at line 216 of file presearch.py.

216def response(resp):
217 results = []
218 json_resp = resp.json()
219
220 if search_type == 'search':
221 results = parse_search_query(json_resp.get('results'))
222
223 elif search_type == 'images':
224 for item in json_resp.get('images', []):
225 results.append(
226 {
227 'template': 'images.html',
228 'title': item['title'],
229 'url': item.get('link'),
230 'img_src': item.get('image'),
231 'thumbnail_src': item.get('thumbnail'),
232 }
233 )
234
235 elif search_type == 'videos':
236 # The results in the video category are most often links to pages that contain
237 # a video and not to a video stream --> SearXNG can't use the video template.
238
239 for item in json_resp.get('videos', []):
240 metadata = [x for x in [item.get('description'), item.get('duration')] if x]
241 results.append(
242 {
243 'title': item['title'],
244 'url': item.get('link'),
245 'content': '',
246 'metadata': ' / '.join(metadata),
247 'thumbnail': item.get('image'),
248 }
249 )
250
251 elif search_type == 'news':
252 for item in json_resp.get('news', []):
253 metadata = [x for x in [item.get('source'), item.get('time')] if x]
254 results.append(
255 {
256 'title': item['title'],
257 'url': item.get('link'),
258 'content': item.get('description', ''),
259 'metadata': ' / '.join(metadata),
260 'thumbnail': item.get('image'),
261 }
262 )
263
264 return results

References searx.engines.presearch.parse_search_query().

+ Here is the call graph for this function:

Variable Documentation

◆ about

dict searx.engines.presearch.about
Initial value:
1= {
2 "website": "https://presearch.io",
3 "wikidiata_id": "Q7240905",
4 "official_api_documentation": "https://docs.presearch.io/nodes/api",
5 "use_official_api": False,
6 "require_api_key": False,
7 "results": "JSON",
8}

Definition at line 72 of file presearch.py.

◆ base_url

str searx.engines.presearch.base_url = "https://presearch.com"

Definition at line 89 of file presearch.py.

◆ categories

list searx.engines.presearch.categories = ["general", "web"]

Definition at line 84 of file presearch.py.

◆ paging

bool searx.engines.presearch.paging = True

Definition at line 80 of file presearch.py.

◆ safesearch

bool searx.engines.presearch.safesearch = True

Definition at line 81 of file presearch.py.

◆ safesearch_map

dict searx.engines.presearch.safesearch_map = {0: 'false', 1: 'true', 2: 'true'}

Definition at line 90 of file presearch.py.

◆ search_type

str searx.engines.presearch.search_type = "search"

Definition at line 86 of file presearch.py.

◆ send_accept_language_header

bool searx.engines.presearch.send_accept_language_header = True

Definition at line 83 of file presearch.py.

◆ time_range_support

bool searx.engines.presearch.time_range_support = True

Definition at line 82 of file presearch.py.