.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.engines.openalex Namespace Reference

Functions

None request (str query, dict[str, t.Any] params)
EngineResults response (SXNG_Response resp)
str|None _stringify_pages (dict[str, t.Any] biblio)
datetime|None _parse_date (str|None value)
str|None _doi_to_plain (str|None doi_value)
str|None _reconstruct_abstract (dict[str, list[int]]|None abstract_inverted_index)
tuple[str, str|None, str|None] _extract_links (dict[str, t.Any] item)
list[str] _extract_authors (dict[str, t.Any] item)
list[str] _extract_tags (dict[str, t.Any] item)
tuple[str|None, str|None, str|None, str|None, str|None, datetime|None] _extract_biblio (dict[str, t.Any] item)
str|None _extract_comments (dict[str, t.Any] item)

Variables

dict about
list categories = ["science", "scientific publications"]
bool paging = True
str search_url = "https://api.openalex.org/works"
str mailto = ""

Function Documentation

◆ _doi_to_plain()

str | None searx.engines.openalex._doi_to_plain ( str | None doi_value)
protected

Definition at line 126 of file openalex.py.

126def _doi_to_plain(doi_value: str | None) -> str | None:
127 if not doi_value:
128 return None
129 # OpenAlex `doi` field is commonly a full URL like https://doi.org/10.1234/abcd
130 return doi_value.removeprefix("https://doi.org/")
131
132

Referenced by response().

Here is the caller graph for this function:

◆ _extract_authors()

list[str] searx.engines.openalex._extract_authors ( dict[str, t.Any] item)
protected

Definition at line 165 of file openalex.py.

165def _extract_authors(item: dict[str, t.Any]) -> list[str]:
166 authors: list[str] = []
167 for auth in item.get("authorships", []):
168 if not auth:
169 continue
170 author_obj = auth.get("author", {})
171 display_name = author_obj.get("display_name")
172 if isinstance(display_name, str) and display_name != "":
173 authors.append(display_name)
174 return authors
175
176

Referenced by response().

Here is the caller graph for this function:

◆ _extract_biblio()

tuple[str | None, str | None, str | None, str | None, str | None, datetime | None] searx.engines.openalex._extract_biblio ( dict[str, t.Any] item)
protected

Definition at line 186 of file openalex.py.

188) -> tuple[str | None, str | None, str | None, str | None, str | None, datetime | None]:
189 host_venue = item.get("host_venue", {})
190 biblio = item.get("biblio", {})
191 journal: str | None = host_venue.get("display_name")
192 publisher: str | None = host_venue.get("publisher")
193 pages = _stringify_pages(biblio)
194 volume = biblio.get("volume")
195 number = biblio.get("issue")
196 published_date = _parse_date(item.get("publication_date"))
197 return journal, publisher, pages, volume, number, published_date
198
199

References _parse_date(), and _stringify_pages().

Referenced by response().

Here is the call graph for this function:
Here is the caller graph for this function:

◆ _extract_comments()

str | None searx.engines.openalex._extract_comments ( dict[str, t.Any] item)
protected

Definition at line 200 of file openalex.py.

200def _extract_comments(item: dict[str, t.Any]) -> str | None:
201 cited_by_count = item.get("cited_by_count")
202 if isinstance(cited_by_count, int):
203 return f"{cited_by_count} citations"
204 return None

Referenced by response().

Here is the caller graph for this function:

◆ _extract_links()

tuple[str, str | None, str | None] searx.engines.openalex._extract_links ( dict[str, t.Any] item)
protected

Definition at line 154 of file openalex.py.

154def _extract_links(item: dict[str, t.Any]) -> tuple[str, str | None, str | None]:
155 primary_location = item.get("primary_location", {})
156 landing_page_url: str | None = primary_location.get("landing_page_url")
157 work_url: str = item.get("id", "")
158 url: str = landing_page_url or work_url
159 open_access = item.get("open_access", {})
160 pdf_url: str | None = primary_location.get("pdf_url") or open_access.get("oa_url")
161 html_url: str | None = landing_page_url
162 return url, html_url, pdf_url
163
164

Referenced by response().

Here is the caller graph for this function:

◆ _extract_tags()

list[str] searx.engines.openalex._extract_tags ( dict[str, t.Any] item)
protected

Definition at line 177 of file openalex.py.

177def _extract_tags(item: dict[str, t.Any]) -> list[str]:
178 tags: list[str] = []
179 for c in item.get("concepts", []):
180 name = (c or {}).get("display_name")
181 if isinstance(name, str) and name != "":
182 tags.append(name)
183 return tags
184
185

Referenced by response().

Here is the caller graph for this function:

◆ _parse_date()

datetime | None searx.engines.openalex._parse_date ( str | None value)
protected

Definition at line 114 of file openalex.py.

114def _parse_date(value: str | None) -> datetime | None:
115 if not value:
116 return None
117 # OpenAlex may return YYYY, YYYY-MM or YYYY-MM-DD
118 for fmt in ("%Y-%m-%d", "%Y-%m", "%Y"):
119 try:
120 return datetime.strptime(value, fmt)
121 except ValueError:
122 continue
123 return None
124
125

Referenced by _extract_biblio().

Here is the caller graph for this function:

◆ _reconstruct_abstract()

str | None searx.engines.openalex._reconstruct_abstract ( dict[str, list[int]] | None abstract_inverted_index)
protected

Definition at line 133 of file openalex.py.

135) -> str | None:
136 # The abstract is returned as an inverted index {token: [positions...]}
137 # Reconstruct by placing tokens at their positions and joining with spaces.
138 if not abstract_inverted_index:
139 return None
140 position_to_token: dict[int, str] = {}
141 max_index = -1
142 for token, positions in abstract_inverted_index.items():
143 for pos in positions:
144 position_to_token[pos] = token
145 max_index = max(max_index, pos)
146 if max_index < 0:
147 return None
148 ordered_tokens = [position_to_token.get(i, "") for i in range(0, max_index + 1)]
149 # collapse multiple empty tokens
150 text = " ".join(t for t in ordered_tokens if t != "")
151 return text if text != "" else None
152
153

Referenced by response().

Here is the caller graph for this function:

◆ _stringify_pages()

str | None searx.engines.openalex._stringify_pages ( dict[str, t.Any] biblio)
protected

Definition at line 102 of file openalex.py.

102def _stringify_pages(biblio: dict[str, t.Any]) -> str | None:
103 first_page = biblio.get("first_page")
104 last_page = biblio.get("last_page")
105 if first_page and last_page:
106 return f"{first_page}-{last_page}"
107 if first_page:
108 return str(first_page)
109 if last_page:
110 return str(last_page)
111 return None
112
113

Referenced by _extract_biblio().

Here is the caller graph for this function:

◆ request()

None searx.engines.openalex.request ( str query,
dict[str, t.Any] params )

Definition at line 34 of file openalex.py.

34def request(query: str, params: dict[str, t.Any]) -> None:
35 # Build OpenAlex query using search parameter and paging
36 args = {
37 "search": query,
38 "page": params["pageno"],
39 # keep result size moderate; OpenAlex default is 25
40 "per-page": 10,
41 # relevance sorting works only with `search`
42 "sort": "relevance_score:desc",
43 }
44
45 # Language filter (expects ISO639-1 like 'fr', 'en')
46 language = params.get("language")
47 filters: list[str] = []
48 if isinstance(language, str) and language != "all":
49 iso2 = language.split("-")[0].split("_")[0]
50 if len(iso2) == 2:
51 filters.append(f"language:{iso2}")
52
53 if filters:
54 args["filter"] = ",".join(filters)
55
56 # include mailto if configured for polite pool (engine module setting)
57 if isinstance(mailto, str) and mailto != "":
58 args["mailto"] = mailto
59
60 params["url"] = f"{search_url}?{urlencode(args)}"
61
62

◆ response()

EngineResults searx.engines.openalex.response ( SXNG_Response resp)

Definition at line 63 of file openalex.py.

63def response(resp: SXNG_Response) -> EngineResults:
64 data = resp.json()
65 res = EngineResults()
66
67 for item in data.get("results", []):
68 url, html_url, pdf_url = _extract_links(item)
69 title: str = item.get("title", "")
70 content: str = _reconstruct_abstract(item.get("abstract_inverted_index")) or ""
71 authors = _extract_authors(item)
72 journal, publisher, pages, volume, number, published_date = _extract_biblio(item)
73 doi = _doi_to_plain(item.get("doi"))
74 tags = _extract_tags(item) or None
75 comments = _extract_comments(item)
76
77 res.add(
78 res.types.LegacyResult(
79 template="paper.html",
80 url=url,
81 title=title,
82 content=content,
83 journal=journal,
84 publisher=publisher,
85 doi=doi,
86 tags=tags,
87 authors=authors,
88 pdf_url=pdf_url,
89 html_url=html_url,
90 publishedDate=published_date,
91 pages=pages,
92 volume=volume,
93 number=number,
94 type=item.get("type"),
95 comments=comments,
96 )
97 )
98
99 return res
100
101

References _doi_to_plain(), _extract_authors(), _extract_biblio(), _extract_comments(), _extract_links(), _extract_tags(), and _reconstruct_abstract().

Here is the call graph for this function:

Variable Documentation

◆ about

dict searx.engines.openalex.about
Initial value:
1= {
2 "website": "https://openalex.org/",
3 "wikidata_id": "Q110718454",
4 "official_api_documentation": "https://docs.openalex.org/how-to-use-the-api/api-overview",
5 "use_official_api": True,
6 "require_api_key": False,
7 "results": "JSON",
8}

Definition at line 14 of file openalex.py.

◆ categories

list searx.engines.openalex.categories = ["science", "scientific publications"]

Definition at line 25 of file openalex.py.

◆ mailto

str searx.engines.openalex.mailto = ""

Definition at line 31 of file openalex.py.

◆ paging

bool searx.engines.openalex.paging = True

Definition at line 26 of file openalex.py.

◆ search_url

str searx.engines.openalex.search_url = "https://api.openalex.org/works"

Definition at line 27 of file openalex.py.