Prune test data to reduce inference time #14

zeynepakkalyoncu · 2019-06-10T18:36:41Z

Throw away all sentence that don't at least have a term that matches the sentence? Other pruning scenarios?

zeynepakkalyoncu · 2019-06-19T13:47:20Z

Removing sentences that do not contain any of the terms in the query reduces the size from 10.6m to 1.4m. However, the scores also take a significant hit:

Experiment: mb_5cv_pruned
1S:
map                   	all	0.3029
P_20                  	all	0.4157
2S:
map                   	all	0.3045
P_20                  	all	0.4163
SS:
map                   	all	0.3034
P_20                  	all	0.4175

I will explore other pruning methods, but it doesn't look too promising.

lintool · 2019-11-22T12:36:03Z

Let's try reranking only the top 100 docs, instead of top 1000.
NDCG is the right metric here; AP will likely be bad.

zeynepakkalyoncu · 2019-11-23T15:41:28Z

NDCG@20 for BERT(MSMARCO, MB) on sentences of top 1000/100 Robust04 docs:

	Top 1000	Top 100 (optimized wrt NDCG)	Top 100 (optimized wrt MAP)
1S	0.5239	0.5131	0.5117
2S	0.5324	0.5206	0.5200
3S	0.5325	0.5228	0.5196

Note that hyperparameters for the first top 100 column are tuned to maximize NDCG@20 (not MAP, which is the default shown in the last column, as I wanted to see the difference). AP is pretty bad for top 100 as expected, but NDCG@20 is reasonable considering it gives us a ~10x speedup.

zeynepakkalyoncu self-assigned this Jun 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prune test data to reduce inference time #14

Prune test data to reduce inference time #14

zeynepakkalyoncu commented Jun 10, 2019

zeynepakkalyoncu commented Jun 19, 2019

lintool commented Nov 22, 2019

zeynepakkalyoncu commented Nov 23, 2019

Prune test data to reduce inference time #14

Prune test data to reduce inference time #14

Comments

zeynepakkalyoncu commented Jun 10, 2019

zeynepakkalyoncu commented Jun 19, 2019

lintool commented Nov 22, 2019

zeynepakkalyoncu commented Nov 23, 2019