Functions
	request (query, params)

EngineResults	response (resp)

Variables
	search_url = None

str	lang_all = 'en'

list	no_result_for_http_status = []

int	soft_max_redirects = 0

str	results_xpath = ''

	url_xpath = None

	content_xpath = None

	title_xpath = None

bool	thumbnail_xpath = False

str	suggestion_xpath = ''

str	cached_xpath = ''

str	cached_url = ''

dict	cookies = {}

dict	headers = {}

str	method = 'GET'

str	request_body = ''

bool	paging = False

int	page_size = 1

int	first_page_num = 1

bool	time_range_support = False

str	time_range_url = '&hours={time_range_val}'

dict	time_range_map

bool	safe_search_support = False

dict	safe_search_map = {0: '&filter=none', 1: '&filter=moderate', 2: '&filter=strict'}

Detailed Description

The XPath engine is a *generic* engine with which it is possible to configure
engines in the settings.

.. _XPath selector: https://quickref.me/xpath.html#xpath-selectors

Configuration
=============

Request:

- :py:obj:`search_url`
- :py:obj:`lang_all`
- :py:obj:`soft_max_redirects`
- :py:obj:`method`
- :py:obj:`request_body`
- :py:obj:`cookies`
- :py:obj:`headers`

Paging:

- :py:obj:`paging`
- :py:obj:`page_size`
- :py:obj:`first_page_num`

Time Range:

- :py:obj:`time_range_support`
- :py:obj:`time_range_url`
- :py:obj:`time_range_map`

Safe-Search:

- :py:obj:`safe_search_support`
- :py:obj:`safe_search_map`

Response:

- :py:obj:`no_result_for_http_status`

`XPath selector`_:

- :py:obj:`results_xpath`
- :py:obj:`url_xpath`
- :py:obj:`title_xpath`
- :py:obj:`content_xpath`
- :py:obj:`thumbnail_xpath`
- :py:obj:`suggestion_xpath`


Example
=======

Here is a simple example of a XPath engine configured in the :ref:`settings
engines` section, further read :ref:`engines-dev`.

.. code:: yaml

  - name : bitbucket
    engine : xpath
    paging : True
    search_url : https://bitbucket.org/repo/all/{pageno}?name={query}
    url_xpath : //article[@class="repo-summary"]//a[@class="repo-link"]/@href
    title_xpath : //article[@class="repo-summary"]//a[@class="repo-link"]
    content_xpath : //article[@class="repo-summary"]/p

Implementations
===============

Function Documentation

◆ request()

searx.engines.xpath.request	(		query,
			params )

Build request parameters (see :ref:`engine request`).

Definition at line 225 of file xpath.py.

def request(query, params):
    '''Build request parameters (see :ref:`engine request`).'''
    lang = lang_all
    if params['language'] != 'all':
        lang = params['language'][:2]
 
    time_range = ''
    if params.get('time_range'):
        time_range_val = time_range_map.get(params.get('time_range'))
        time_range = time_range_url.format(time_range_val=time_range_val)
 
    safe_search = ''
    if params['safesearch']:
        safe_search = safe_search_map[params['safesearch']]
 
    fargs = {
        'query': urlencode({'q': query})[2:],
        'lang': lang,
        'pageno': (params['pageno'] - 1) * page_size + first_page_num,
        'time_range': time_range,
        'safe_search': safe_search,
    }
 
    params['cookies'].update(cookies)
    params['headers'].update(headers)
 
    params['url'] = search_url.format(**fargs)
    params['method'] = method
 
    if request_body:
        # don't url-encode the query if it's in the request body
        fargs['query'] = query
        params['data'] = request_body.format(**fargs)
 
    params['soft_max_redirects'] = soft_max_redirects
    params['raise_for_httperror'] = False
 
    return params
 
 

◆ response()

EngineResults searx.engines.xpath.response ( resp )

Scrap *results* from the response (see :ref:`result types`).

Definition at line 265 of file xpath.py.

def response(resp) -> EngineResults:  # pylint: disable=too-many-branches
    """Scrap *results* from the response (see :ref:`result types`)."""
    results = EngineResults()
 
    if no_result_for_http_status and resp.status_code in no_result_for_http_status:
        return results
 
    raise_for_httperror(resp)
 
    if not resp.text:
        return results
 
    dom = html.fromstring(resp.text)
    is_onion = 'onions' in categories
 
    if results_xpath:
        for result in eval_xpath_list(dom, results_xpath):
 
            url = extract_url(eval_xpath_list(result, url_xpath, min_len=1), search_url)
            title = extract_text(eval_xpath_list(result, title_xpath, min_len=1))
            content = extract_text(eval_xpath_list(result, content_xpath))
            tmp_result = {'url': url, 'title': title, 'content': content}
 
            # add thumbnail if available
            if thumbnail_xpath:
                thumbnail_xpath_result = eval_xpath_list(result, thumbnail_xpath)
                if len(thumbnail_xpath_result) > 0:
                    tmp_result['thumbnail'] = extract_url(thumbnail_xpath_result, search_url)
 
            # add alternative cached url if available
            if cached_xpath:
                tmp_result['cached_url'] = cached_url + extract_text(eval_xpath_list(result, cached_xpath, min_len=1))
 
            if is_onion:
                tmp_result['is_onion'] = True
 
            results.append(tmp_result)
 
    else:
        if cached_xpath:
            for url, title, content, cached in zip(
                (extract_url(x, search_url) for x in eval_xpath_list(dom, url_xpath)),
                map(extract_text, eval_xpath_list(dom, title_xpath)),
                map(extract_text, eval_xpath_list(dom, content_xpath)),
                map(extract_text, eval_xpath_list(dom, cached_xpath)),
            ):
                results.append(
                    {
                        'url': url,
                        'title': title,
                        'content': content,
                        'cached_url': cached_url + cached,
                        'is_onion': is_onion,
                    }
                )
        else:
            for url, title, content in zip(
                (extract_url(x, search_url) for x in eval_xpath_list(dom, url_xpath)),
                map(extract_text, eval_xpath_list(dom, title_xpath)),
                map(extract_text, eval_xpath_list(dom, content_xpath)),
            ):
                results.append({'url': url, 'title': title, 'content': content, 'is_onion': is_onion})
 
    if suggestion_xpath:
        for suggestion in eval_xpath(dom, suggestion_xpath):
            results.append({'suggestion': extract_text(suggestion)})
 
    logger.debug("found %s results", len(results))
    return results

Variable Documentation

◆ cached_url

str searx.engines.xpath.cached_url = ''

Definition at line 147 of file xpath.py.

◆ cached_xpath

str searx.engines.xpath.cached_xpath = ''

Definition at line 146 of file xpath.py.

◆ content_xpath

searx.engines.xpath.content_xpath = None

Definition at line 134 of file xpath.py.

◆ cookies

dict searx.engines.xpath.cookies = {}

Definition at line 149 of file xpath.py.

◆ first_page_num

int searx.engines.xpath.first_page_num = 1

Definition at line 174 of file xpath.py.

◆ headers

dict searx.engines.xpath.headers = {}

Definition at line 153 of file xpath.py.

◆ lang_all

str searx.engines.xpath.lang_all = 'en'

Definition at line 112 of file xpath.py.

◆ method

str searx.engines.xpath.method = 'GET'

Definition at line 157 of file xpath.py.

◆ no_result_for_http_status

list searx.engines.xpath.no_result_for_http_status = []

Definition at line 117 of file xpath.py.

◆ page_size

int searx.engines.xpath.page_size = 1

Definition at line 170 of file xpath.py.

◆ paging

bool searx.engines.xpath.paging = False

Definition at line 167 of file xpath.py.

◆ request_body

str searx.engines.xpath.request_body = ''

Definition at line 160 of file xpath.py.

◆ results_xpath

str searx.engines.xpath.results_xpath = ''

Definition at line 128 of file xpath.py.

◆ safe_search_map

dict searx.engines.xpath.safe_search_map = {0: '&filter=none', 1: '&filter=moderate', 2: '&filter=strict'}

Definition at line 211 of file xpath.py.

◆ safe_search_support

bool searx.engines.xpath.safe_search_support = False

Definition at line 208 of file xpath.py.

◆ search_url

searx.engines.xpath.search_url = None

Definition at line 79 of file xpath.py.

◆ soft_max_redirects

int searx.engines.xpath.soft_max_redirects = 0

Definition at line 125 of file xpath.py.

◆ suggestion_xpath

str searx.engines.xpath.suggestion_xpath = ''

Definition at line 143 of file xpath.py.

◆ thumbnail_xpath

bool searx.engines.xpath.thumbnail_xpath = False

Definition at line 140 of file xpath.py.

◆ time_range_map

dict searx.engines.xpath.time_range_map

Initial value:

=  {
    'day': 24,
    'week': 24 * 7,
    'month': 24 * 30,
    'year': 24 * 365,
}

Definition at line 190 of file xpath.py.

◆ time_range_support

bool searx.engines.xpath.time_range_support = False

Definition at line 177 of file xpath.py.

◆ time_range_url

str searx.engines.xpath.time_range_url = '&hours={time_range_val}'

Definition at line 180 of file xpath.py.

◆ title_xpath

searx.engines.xpath.title_xpath = None

Definition at line 137 of file xpath.py.

◆ url_xpath

searx.engines.xpath.url_xpath = None

Definition at line 131 of file xpath.py.

Functions

Variables