ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored #4

denizbeser · 2020-06-09T19:18:41Z

If you look at the original dev data, you will see every datapoint is distinct. Training data set, however, has a lot of repetitions. This makes it infeasible to do a 90-10-10 split.

Potential solutions:

one reasonable thing to do would be to (a) separate in a "not overlapping" dev and set up a cross-fold validation experiment
using a fraction of the original dev as internal dev for Anli

denizbeser changed the title ~~ANLI train distribution makes it hard to create internal dev - so it's temporarily ignored~~ ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored Jun 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored #4

ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored #4

denizbeser commented Jun 9, 2020

ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored #4

ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored #4

Comments

denizbeser commented Jun 9, 2020