Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual: Query syntax is documented on closed corpus #1728

Open
BeritJanssen opened this issue Dec 19, 2024 · 2 comments
Open

Manual: Query syntax is documented on closed corpus #1728

BeritJanssen opened this issue Dec 19, 2024 · 2 comments

Comments

@BeritJanssen
Copy link
Contributor

BeritJanssen commented Dec 19, 2024

Right now, all the examples in the query.md section of the manual are taken from the "Dutch Annual Reports" corpus. This documentation was originally made by José for the researchers of that project, but especially the section of examples with query / hits isn't very useful this way.

Alternatives:

  • use an open (&English language?) corpus instead. E.g., Goodreads, or Parliament-Netherlands, if we're fine with Dutch examples
  • ask if the "Dutch Annual Reports" (which is, contrary to the Dutch origin, English language) may be opened to everyone.

I suspect that option 2 wouldn't be met with much resistance, and would be less work.

@lukavdplas
Copy link
Contributor

Perhaps we should keep an eye out for a small public-domain English dataset that we can use for examples / demos.

@BeritJanssen
Copy link
Contributor Author

I also noticed that the reported numbers actually don't match. The manual was probably written for an older (smaller) version of the corpus ("Dutch Banking"), which only included financial reports of banks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants