.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.engines.github_code Namespace Reference

Functions

None request (str query, dict[str, t.Any] params)
tuple[list[str], set[int]] extract_code (list[dict[str, t.Any]] code_matches)
EngineResults response (SXNG_Response resp)

Variables

dict about
list categories = ['code']
str search_url = 'https://api.github.com/search/code?sort=indexed&{query}&{page}'
str accept_header = 'application/vnd.github.text-match+json'
bool paging = True
dict ghc_auth
bool ghc_highlight_matching_lines = True
bool ghc_strip_new_lines = True
bool ghc_strip_whitespace = False
str ghc_api_version = "2022-11-28"
bool ghc_insert_block_separator = False

Detailed Description

GitHub code search with `search syntax`_ as described in `Constructing a
search query`_ in the documentation of GitHub's REST API.

.. _search syntax:
    https://docs.github.com/en/search-github/getting-started-with-searching-on-github/understanding-the-search-syntax
.. _Constructing a search query:
    https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#constructing-a-search-query
.. _Github REST API for code search:
    https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#search-code
.. _Github REST API auth for code search:
    https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#search-code--fine-grained-access-tokens

Configuration
=============

The engine has the following mandatory setting:

- :py:obj:`ghc_auth`
  Change the authentication method used when using the API, defaults to none.

Optional settings are:

- :py:obj:`ghc_highlight_matching_lines`
   Control the highlighting of the matched text (turns off/on).
- :py:obj:`ghc_strip_new_lines`
   Strip new lines at the start or end of each code fragment.
- :py:obj:`ghc_strip_whitespace`
   Strip any whitespace at the start or end of each code fragment.
- :py:obj:`ghc_insert_block_separator`
   Add a `...` between each code fragment before merging them.

.. code:: yaml

  - name: github code
    engine: github_code
    shortcut: ghc
    ghc_auth:
      type: "none"

  - name: github code
    engine: github_code
    shortcut: ghc
    ghc_auth:
      type: "personal_access_token"
      token: "<token>"
    ghc_highlight_matching_lines: true
    ghc_strip_whitespace: true
    ghc_strip_new_lines: true


  - name: github code
    engine: github_code
    shortcut: ghc
    ghc_auth:
      type: "bearer"
      token: "<token>"

Implementation
===============

GitHub does not return the code line indices alongside the code fragment in the
search API. Since these are not super important for the user experience all the
code lines are just relabeled (starting from 1) and appended (a disjoint set of
code blocks in a single file might be returned from the API).

Function Documentation

◆ extract_code()

tuple[list[str], set[int]] searx.engines.github_code.extract_code ( list[dict[str, t.Any]] code_matches)
Iterate over multiple possible matches, for each extract a code fragment.
Github additionally sends context for _word_ highlights; pygments supports
highlighting lines, as such we calculate which lines to highlight while
traversing the text.

Definition at line 162 of file github_code.py.

162def extract_code(code_matches: list[dict[str, t.Any]]) -> tuple[list[str], set[int]]:
163 """
164 Iterate over multiple possible matches, for each extract a code fragment.
165 Github additionally sends context for _word_ highlights; pygments supports
166 highlighting lines, as such we calculate which lines to highlight while
167 traversing the text.
168 """
169 lines: list[str] = []
170 highlighted_lines_index: set[int] = set()
171
172 for i, match in enumerate(code_matches):
173 if i > 0 and ghc_insert_block_separator:
174 lines.append("...")
175 buffer: list[str] = []
176 highlight_groups = [highlight_group['indices'] for highlight_group in match['matches']]
177
178 code: str = match['fragment']
179 original_code_lenght = len(code)
180
181 if ghc_strip_whitespace:
182 code = code.lstrip()
183 if ghc_strip_new_lines:
184 code = code.lstrip("\n")
185
186 offset = original_code_lenght - len(code)
187
188 if ghc_strip_whitespace:
189 code = code.rstrip()
190 if ghc_strip_new_lines:
191 code = code.rstrip("\n")
192
193 for i, letter in enumerate(code):
194 if len(highlight_groups) > 0:
195 # the API ensures these are sorted already, and we have a
196 # guaranteed match in the code (all indices are in the range 0
197 # and len(fragment)), so only check the first highlight group
198 [after, before] = highlight_groups[0]
199 if after <= (i + offset) < before:
200 # pygments enumerates lines from 1, highlight the next line
201 highlighted_lines_index.add(len(lines) + 1)
202 highlight_groups.pop(0)
203
204 if letter == "\n":
205 lines.append("".join(buffer))
206 buffer = []
207 continue
208
209 buffer.append(letter)
210 lines.append("".join(buffer))
211 return lines, highlighted_lines_index
212
213

Referenced by response().

Here is the caller graph for this function:

◆ request()

None searx.engines.github_code.request ( str query,
dict[str, t.Any] params )

Definition at line 144 of file github_code.py.

144def request(query: str, params: dict[str, t.Any]) -> None:
145
146 params['url'] = search_url.format(query=urlencode({'q': query}), page=urlencode({'page': params['pageno']}))
147 params['headers']['Accept'] = accept_header
148 params['headers']['X-GitHub-Api-Version'] = ghc_api_version
149
150 if ghc_auth['type'] == "none":
151 # Without the auth header the query fails, so add a dummy instead.
152 # Queries without auth are heavily rate limited.
153 params['headers']['Authorization'] = "placeholder"
154 if ghc_auth['type'] == "personal_access_token":
155 params['headers']['Authorization'] = f"token {ghc_auth['token']}"
156 if ghc_auth['type'] == "bearer":
157 params['headers']['Authorization'] = f"Bearer {ghc_auth['token']}"
158
159 params['raise_for_httperror'] = False
160
161

◆ response()

EngineResults searx.engines.github_code.response ( SXNG_Response resp)

Definition at line 214 of file github_code.py.

214def response(resp: SXNG_Response) -> EngineResults:
215 res = EngineResults()
216
217 if resp.status_code == 422:
218 # on a invalid search term the status code 422 "Unprocessable Content"
219 # is returned / e.g. search term is "user: foo" instead "user:foo"
220 return res
221 # raise for other errors
222 raise_for_httperror(resp)
223
224 for item in resp.json().get('items', []):
225 repo: dict[str, str] = item['repository'] # pyright: ignore[reportAny]
226 text_matches: list[dict[str, str]] = item['text_matches'] # pyright: ignore[reportAny]
227 # ensure picking only the code contents in the blob
228 code_matches = [
229 match for match in text_matches if match["object_type"] == "FileContent" and match["property"] == "content"
230 ]
231 lines, highlighted_lines_index = extract_code(code_matches)
232 if not ghc_highlight_matching_lines:
233 highlighted_lines_index: set[int] = set()
234
235 res.add(
236 res.types.Code(
237 url=item["html_url"], # pyright: ignore[reportAny]
238 title=f"{repo['full_name']} ยท {item['name']}",
239 filename=f"{item['path']}",
240 content=repo['description'],
241 repository=repo['html_url'],
242 codelines=[(i + 1, line) for (i, line) in enumerate(lines)],
243 hl_lines=highlighted_lines_index,
244 strip_whitespace=ghc_strip_whitespace,
245 strip_new_lines=ghc_strip_new_lines,
246 )
247 )
248
249 return res

References extract_code().

Here is the call graph for this function:

Variable Documentation

◆ about

dict searx.engines.github_code.about
Initial value:
1= {
2 "website": 'https://github.com/',
3 "wikidata_id": 'Q364',
4 "official_api_documentation": 'https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#search-code',
5 "use_official_api": True,
6 "require_api_key": False,
7 "results": 'JSON',
8}

Definition at line 77 of file github_code.py.

◆ accept_header

str searx.engines.github_code.accept_header = 'application/vnd.github.text-match+json'

Definition at line 92 of file github_code.py.

◆ categories

list searx.engines.github_code.categories = ['code']

Definition at line 87 of file github_code.py.

◆ ghc_api_version

str searx.engines.github_code.ghc_api_version = "2022-11-28"

Definition at line 133 of file github_code.py.

◆ ghc_auth

dict searx.engines.github_code.ghc_auth
Initial value:
1= {
2 "type": "none",
3 "token": "",
4}

Definition at line 95 of file github_code.py.

◆ ghc_highlight_matching_lines

bool searx.engines.github_code.ghc_highlight_matching_lines = True

Definition at line 119 of file github_code.py.

◆ ghc_insert_block_separator

bool searx.engines.github_code.ghc_insert_block_separator = False

Definition at line 137 of file github_code.py.

◆ ghc_strip_new_lines

bool searx.engines.github_code.ghc_strip_new_lines = True

Definition at line 122 of file github_code.py.

◆ ghc_strip_whitespace

bool searx.engines.github_code.ghc_strip_whitespace = False

Definition at line 127 of file github_code.py.

◆ paging

bool searx.engines.github_code.paging = True

Definition at line 93 of file github_code.py.

◆ search_url

str searx.engines.github_code.search_url = 'https://api.github.com/search/code?sort=indexed&{query}&{page}'

Definition at line 90 of file github_code.py.