.oO SearXNG Developer Documentation Oo.
Loading...
Searching...
No Matches
searx.results Namespace Reference

Classes

class  ResultContainer
 
class  Timing
 
class  UnresponsiveEngine
 

Functions

 result_content_len (content)
 
 compare_urls (url_a, url_b)
 
 merge_two_infoboxes (infobox1, infobox2)
 
 result_score (result)
 

Variables

 CONTENT_LEN_IGNORED_CHARS_REGEX = re.compile(r'[,;:!?\./\\\\ ()-_]', re.M | re.U)
 
 WHITESPACE_REGEX = re.compile('( |\t|\n)+', re.M | re.U)
 

Function Documentation

◆ compare_urls()

searx.results.compare_urls ( url_a,
url_b )
Lazy compare between two URL.
"www.example.com" and "example.com" are equals.
"www.example.com/path/" and "www.example.com/path" are equals.
"https://www.example.com/" and "http://www.example.com/" are equals.

Args:
    url_a (ParseResult): first URL
    url_b (ParseResult): second URL

Returns:
    bool: True if url_a and url_b are equals

Definition at line 28 of file results.py.

28def compare_urls(url_a, url_b):
29 """Lazy compare between two URL.
30 "www.example.com" and "example.com" are equals.
31 "www.example.com/path/" and "www.example.com/path" are equals.
32 "https://www.example.com/" and "http://www.example.com/" are equals.
33
34 Args:
35 url_a (ParseResult): first URL
36 url_b (ParseResult): second URL
37
38 Returns:
39 bool: True if url_a and url_b are equals
40 """
41 # ignore www. in comparison
42 if url_a.netloc.startswith('www.'):
43 host_a = url_a.netloc.replace('www.', '', 1)
44 else:
45 host_a = url_a.netloc
46 if url_b.netloc.startswith('www.'):
47 host_b = url_b.netloc.replace('www.', '', 1)
48 else:
49 host_b = url_b.netloc
50
51 if host_a != host_b or url_a.query != url_b.query or url_a.fragment != url_b.fragment:
52 return False
53
54 # remove / from the end of the url if required
55 path_a = url_a.path[:-1] if url_a.path.endswith('/') else url_a.path
56 path_b = url_b.path[:-1] if url_b.path.endswith('/') else url_b.path
57
58 return unquote(path_a) == unquote(path_b)
59
60

Referenced by searx.results.merge_two_infoboxes().

+ Here is the caller graph for this function:

◆ merge_two_infoboxes()

searx.results.merge_two_infoboxes ( infobox1,
infobox2 )

Definition at line 61 of file results.py.

61def merge_two_infoboxes(infobox1, infobox2): # pylint: disable=too-many-branches, too-many-statements
62 # get engines weights
63 if hasattr(engines[infobox1['engine']], 'weight'):
64 weight1 = engines[infobox1['engine']].weight
65 else:
66 weight1 = 1
67 if hasattr(engines[infobox2['engine']], 'weight'):
68 weight2 = engines[infobox2['engine']].weight
69 else:
70 weight2 = 1
71
72 if weight2 > weight1:
73 infobox1['engine'] = infobox2['engine']
74
75 infobox1['engines'] |= infobox2['engines']
76
77 if 'urls' in infobox2:
78 urls1 = infobox1.get('urls', None)
79 if urls1 is None:
80 urls1 = []
81
82 for url2 in infobox2.get('urls', []):
83 unique_url = True
84 parsed_url2 = urlparse(url2.get('url', ''))
85 entity_url2 = url2.get('entity')
86 for url1 in urls1:
87 if (entity_url2 is not None and url1.get('entity') == entity_url2) or compare_urls(
88 urlparse(url1.get('url', '')), parsed_url2
89 ):
90 unique_url = False
91 break
92 if unique_url:
93 urls1.append(url2)
94
95 infobox1['urls'] = urls1
96
97 if 'img_src' in infobox2:
98 img1 = infobox1.get('img_src', None)
99 img2 = infobox2.get('img_src')
100 if img1 is None:
101 infobox1['img_src'] = img2
102 elif weight2 > weight1:
103 infobox1['img_src'] = img2
104
105 if 'attributes' in infobox2:
106 attributes1 = infobox1.get('attributes')
107 if attributes1 is None:
108 infobox1['attributes'] = attributes1 = []
109
110 attributeSet = set()
111 for attribute in attributes1:
112 label = attribute.get('label')
113 if label not in attributeSet:
114 attributeSet.add(label)
115 entity = attribute.get('entity')
116 if entity not in attributeSet:
117 attributeSet.add(entity)
118
119 for attribute in infobox2.get('attributes', []):
120 if attribute.get('label') not in attributeSet and attribute.get('entity') not in attributeSet:
121 attributes1.append(attribute)
122
123 if 'content' in infobox2:
124 content1 = infobox1.get('content', None)
125 content2 = infobox2.get('content', '')
126 if content1 is not None:
127 if result_content_len(content2) > result_content_len(content1):
128 infobox1['content'] = content2
129 else:
130 infobox1['content'] = content2
131
132

References searx.results.compare_urls(), and searx.results.result_content_len().

+ Here is the call graph for this function:

◆ result_content_len()

searx.results.result_content_len ( content)

Definition at line 22 of file results.py.

22def result_content_len(content):
23 if isinstance(content, str):
24 return len(CONTENT_LEN_IGNORED_CHARS_REGEX.sub('', content))
25 return 0
26
27

Referenced by searx.results.merge_two_infoboxes().

+ Here is the caller graph for this function:

◆ result_score()

searx.results.result_score ( result)

Definition at line 133 of file results.py.

133def result_score(result):
134 weight = 1.0
135
136 for result_engine in result['engines']:
137 if hasattr(engines[result_engine], 'weight'):
138 weight *= float(engines[result_engine].weight)
139
140 occurrences = len(result['positions'])
141
142 return sum((occurrences * weight) / position for position in result['positions'])
143
144

Variable Documentation

◆ CONTENT_LEN_IGNORED_CHARS_REGEX

searx.results.CONTENT_LEN_IGNORED_CHARS_REGEX = re.compile(r'[,;:!?\./\\\\ ()-_]', re.M | re.U)

Definition at line 17 of file results.py.

◆ WHITESPACE_REGEX

searx.results.WHITESPACE_REGEX = re.compile('( |\t|\n)+', re.M | re.U)

Definition at line 18 of file results.py.