Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index datastructure map of docid -> docno #25

Open
lgrz opened this issue Jul 8, 2019 · 0 comments
Open

Index datastructure map of docid -> docno #25

lgrz opened this issue Jul 8, 2019 · 0 comments
Labels

Comments

@lgrz
Copy link
Contributor

lgrz commented Jul 8, 2019

The feature extraction program extract_features is bound to calling on Indri's metadata. This could be handled by a (docid, docno) map file that is created by Tesserae at index time.

The following could be replaced with a map lookup:

std::vector<docid_t> docids = qry_env.document_ids_from_metadata("docno", docnos);

This also has the benefit of reducing the dependency on Indri to the programs that absolutely require it (i.e the indexing programs).

Related to #13 improving index component.

@lgrz lgrz added the index label Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

1 participant