Skip to content

Latest commit

 

History

History
310 lines (302 loc) · 11.9 KB

features.md

File metadata and controls

310 lines (302 loc) · 11.9 KB

Features

Document features

Feature No. Description
1 Stage0 score from TREC input file
2 BM25 Atire for the document
3 BM25 Atire for the document body
4 BM25 Atire for the document title
5 BM25 Atire for the document headings (h1-h4)
6 BM25 Atire for document inlinks
7 BM25 Atire for document a tags
8 BM25 TREC 3 (k = 1.2) for the document
9 BM25 TREC 3 (k = 1.2) for the document body
10 BM25 TREC 3 (k = 1.2) for the document title
11 BM25 TREC 3 (k = 1.2) for the document headings (h1-h4)
12 BM25 TREC 3 (k = 1.2) for the document inlinks
13 BM25 TREC 3 (k = 1.2) for the document a tags
14 BM25 TREC 3 (k = 2.0) for the document
15 BM25 TREC 3 (k = 2.0) for the document body
16 BM25 TREC 3 (k = 2.0) for the document title
17 BM25 TREC 3 (k = 2.0) for the document headings (h1-h4)
18 BM25 TREC 3 (k = 2.0) for the document inlinks
19 BM25 TREC 3 (k = 2.0) for the document a tags
20 LMDS (mu = 2500) for the document
21 LMDS (mu = 2500) for the document body
22 LMDS (mu = 2500) for the document title
23 LMDS (mu = 2500) for the document headings (h1-h4)
24 LMDS (mu = 2500) for the document inlinks
25 LMDS (mu = 2500) for the document a tags
26 LMDS (mu = 1500) for the document
27 LMDS (mu = 1500) for the document body
28 LMDS (mu = 1500) for the document title
29 LMDS (mu = 1500) for the document headings (h1-h4)
30 LMDS (mu = 1500) for the document inlinks
31 LMDS (mu = 1500) for the document a tags
32 LMDS (mu = 1000) for the document
33 LMDS (mu = 1000) for the document body
34 LMDS (mu = 1000) for the document title
35 LMDS (mu = 1000) for the document headings (h1-h4)
36 LMDS (mu = 1000) for the document inlinks
37 LMDS (mu = 1000) for the document a tags
38 TF.IDF for the document
39 TF.IDF for the document body
40 TF.IDF for the document title
41 TF.IDF for the document headings (h1-h4)
42 TF.IDF for the document inlinks
43 TF.IDF for the document a tags
44 Probability score for the document
45 Probability score for the document body
46 Probability score for the document title
47 Probability score for the document headings (h1-h4)
48 Probability score for the document inlinks
49 Probability score for the document a tags
50 Bose-Einstein score for the document
51 Bose-Einstein score for the document body
52 Bose-Einstein score for the document title
53 Bose-Einstein score for the document headings (h1-h4)
54 Bose-Einstein score for the document inlinks
55 Bose-Einstein score for the document a tags
56 DPH score for the document
57 DPH score for the document body
58 DPH score for the document title
59 DPH score for the document headings (h1-h4)
60 DPH score for the document inlinks
61 DPH score for the document a tags
62 DFR score for the document
63 DFR score for the document body
64 DFR score for the document title
65 DFR score for the document headings (h1-h4)
66 DFR score for the document inlinks
67 DFR score for the document a tags
68 Raw stream length of document
69 Raw stream length of document body
70 Raw stream length of document title
71 Raw stream length of document headings (h1-h4)
72 Raw stream length of document inlinks
73 Raw stream length of document a tags
74 Term frequency normalized sum of stream length of document
75 Term frequency normalized sum of stream length of document body
76 Term frequency normalized sum of stream length of document title
77 Term frequency normalized sum of stream length of document headings (h1-h4)
78 Term frequency normalized sum of stream length of document inlinks
79 Term frequency normalized sum of stream length of document a tags
80 Term frequency normalized min of stream length of document
81 Term frequency normalized min of stream length of document body
82 Term frequency normalized min of stream length of document title
83 Term frequency normalized min of stream length of document headings (h1-h4)
84 Term frequency normalized min of stream length of document inlinks
85 Term frequency normalized min of stream length of document a tags
86 Term frequency normalized max of stream length of document
87 Term frequency normalized max of stream length of document body
88 Term frequency normalized max of stream length of document title
89 Term frequency normalized max of stream length of document headings (h1-h4)
90 Term frequency normalized max of stream length of document inlinks
91 Term frequency normalized max of stream length of document a tags
92 Term frequency normalized mean of stream length of document
93 Term frequency normalized mean of stream length of document body
94 Term frequency normalized mean of stream length of document title
95 Term frequency normalized mean of stream length of document headings (h1-h4)
96 Term frequency normalized mean of stream length of document inlinks
97 Term frequency normalized mean of stream length of document a tags
98 Term frequency normalized variance of stream length of document
99 Term frequency normalized variance of stream length of document body
100 Term frequency normalized variance of stream length of document title
101 Term frequency normalized variance of stream length of document headings (h1-h4)
102 Term frequency normalized variance of stream length of document inlinks
103 Term frequency normalized variance of stream length of document a tags
104 TP-Score of document
105 Sum of BM25 query bigrams within an unordered window of 8 of the document
106 BM25 bigram interval score within a window of 100 of the document
107 SDM of document using default parameters
108 Frequency of query terms within the document title
109 Frequency of query terms within the document headings (h1-h4)
110 Frequency of query terms within the document
111 Frequency of query terms within the document inlinks
112 Document tag count for title
113 Document tag count for headings (h1-h4)
114 Document tag count for applet
115 Document tag count for object
116 Document tag count for embed

Static document features

Feature No. Description
117 Document length
118 Title length
119 Visible term length
120 URL length
121 URL depth
122 Average term length
123 Entropy
124 Stop cover
125 Stop ratio
126 Fraction of anchor text
127 Fraction of visible text
128 Fraction of table text
129 Fraction of td text
130 Is wikipedia.org URL

Unigram pre-retrieval features

Feature No. Description
131 Mean collection document frequency of query terms
132 Mean geometric mean of query terms
133 Min collection document frequency (query term)
134 Max collection document frequency (query term)
135 Min geometric mean (query term)
136 Max geometric mean (query term)
137 BM25 mean impact
138 BM25 mean mean
139 BM25 mean harmonic mean
140 BM25 mean median
141 BM25 mean interquartile range
142 BM25 mean variance
143 BM25 min
144 BM25 max
145 BM25 min mean
146 BM25 max mean
147 BM25 min median
148 BM25 max median
149 BM25 min harmonic mean
150 BM25 max harmonic mean
151 BM25 min variance
152 BM25 max variance
153 BM25 min first quartile
154 BM25 max first quartile
155 BM25 min third quartile
156 BM25 max third quartile
157 TF.IDF mean impact
158 TF.IDF mean mean
159 TF.IDF mean harmonic mean
160 TF.IDF mean median
161 TF.IDF mean interquartile range
162 TF.IDF mean variance
163 TF.IDF min
164 TF.IDF max
165 TF.IDF min mean
166 TF.IDF max mean
167 TF.IDF min median
168 TF.IDF max median
169 TF.IDF min harmonic mean
170 TF.IDF max harmonic mean
171 TF.IDF min variance
172 TF.IDF max variance
173 TF.IDF min first quartile
174 TF.IDF max first quartile
175 TF.IDF min third quartile
176 TF.IDF max third quartile
177 LM mean impact
178 LM mean mean
179 LM mean harmonic mean
180 LM mean median
181 LM mean interquartile range
182 LM mean variance
183 LM min
184 LM max
185 LM min mean
186 LM max mean
187 LM min median
188 LM max median
189 LM min harmonic mean
190 LM max harmonic mean
191 LM min variance
192 LM max variance
193 LM min first quartile
194 LM max first quartile
195 LM min third quartile
196 LM max third quartile
197 PR mean impact
198 PR mean mean
199 PR mean harmonic mean
200 PR mean median
201 PR mean interquartile range
202 PR mean variance
203 PR min
204 PR max
205 PR min mean
206 PR max mean
207 PR min median
208 PR max median
209 PR min harmonic mean
210 PR max harmonic mean
211 PR min variance
212 PR max variance
213 PR min first quartile
214 PR max first quartile
215 PR min third quartile
216 PR max third quartile
217 BE mean impact
218 BE mean mean
219 BE mean harmonic mean
220 BE mean median
221 BE mean interquartile range
222 BE mean variance
223 BE min
224 BE max
225 BE min mean
226 BE max mean
227 BE min median
228 BE max median
229 BE min harmonic mean
230 BE max harmonic mean
231 BE min variance
232 BE max variance
233 BE min first quartile
234 BE max first quartile
235 BE min third quartile
236 BE max third quartile
237 DPH mean impact
238 DPH mean mean
239 DPH mean harmonic mean
240 DPH mean median
241 DPH mean interquartile range
242 DPH mean variance
243 DPH min
244 DPH max
245 DPH min mean
246 DPH max mean
247 DPH min median
248 DPH max median
249 DPH min harmonic mean
250 DPH max harmonic mean
251 DPH min variance
252 DPH max variance
253 DPH min first quartile
254 DPH max first quartile
255 DPH min third quartile
256 DPH max third quartile
257 DFR mean impact
258 DFR mean mean
259 DFR mean harmonic mean
260 DFR mean median
261 DFR mean interquartile range
262 DFR mean variance
263 DFR min
264 DFR max
265 DFR min mean
266 DFR max mean
267 DFR min median
268 DFR max median
269 DFR min harmonic mean
270 DFR max harmonic mean
271 DFR min variance
272 DFR max variance
273 DFR min first quartile
274 DFR max first quartile
275 DFR min third quartile
276 DFR max third quartile

Query features

Feature No. Description
277 Query length
278 Query length non-stop words
279 Simplified clarity score
280 Query scope score
281 TF.IDF average
282 TF.IDF variance
283 TF.IDF standard deviation
284 TF.IDF confidence
285 Gamma2: IDF max / IDF min
286 Average IDF on full query
287 Average IDF non-stop words
288 Average ICTF on full query
289 Average ICTF non-stop words