Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for get (g) CLI functionality to be ran using a Wikidata lexemes dump #521

Closed
2 tasks done
andrewtavis opened this issue Dec 8, 2024 · 4 comments
Closed
2 tasks done
Assignees
Labels
feature New feature or request help wanted Extra attention is needed Outreachy Available for Outreachy participants

Comments

@andrewtavis
Copy link
Member

Terms

Description

This issue will be the second issue to add dump processing functionality to the Scribe-Data CLI. In it, we'll do the following:

  • Assure that the --wikidata-dump (-wd) argument have been added to the get (g) command in Make a Wikidata lexemes dump required for scribe-data g -a #519
  • If the user passes this argument, the Add check_lexeme_dump_prompt_download function to cli/utils.py #518 functionality will be passed to make sure that a dump is available or download one
  • From there, the functionality of the get command will be ran over the dump rather than the via the Wikidata query service
  • Example functionality includes:
    • scribe-data get -lang English -dt nouns -wd PATH_TO_DUMP
      • Check to see if there is a local dump to run against and use that if the user indicates that it should be
      • Would not get the most up to date data, but if they want to run with a dump, then they can
      • Allows the user to parse locally without an internet connection
    • scribe-data get -lang English -a -wd PATH_TO_DUMP
      • Check to see if there is a local dump to run against and use that if the user indicates that it should be
      • Else run all English queries
      • Indicates to the user that using dumps would be more responsible and suggests -wd argument to them
    • scribe-data get -lang English French Spanish -a
      • Check to see if there is a local dump to run against and use that if the user indicates that it should be
      • Runs the queries if the user doesn't pass a dump and doesn't want to download one
      • Indicates to the user that using dumps would be more responsible and suggests -wd argument to them

We'll need to discuss this a bit before work is started to make sure that nothing has changed above while prior issues are being worked on 😊

Contribution

@axif0 will be working on this as a part of Outreachy! 📶🚈

@andrewtavis andrewtavis added feature New feature or request help wanted Extra attention is needed Outreachy Available for Outreachy participants labels Dec 8, 2024
@andrewtavis andrewtavis changed the title Allow for get (t) CLI functionality to be ran using a Wikidata lexemes dump Allow for get (g) CLI functionality to be ran using a Wikidata lexemes dump Dec 16, 2024
@axif0
Copy link
Collaborator

axif0 commented Jan 5, 2025

We can consider the issue resolved, as all tasks have been completed. Happy to work and looking forward to collaborating on more tasks. 😊

@andrewtavis @wkyoshida

@andrewtavis
Copy link
Member Author

Ideally this would work to check a default Wikidata dump via a -wdp argument, so like:

scribe-data g -lang German -dt nouns -wdp

This would then check for the default dump and run or check with the user to download a default dump if one doesn't exist.

@andrewtavis
Copy link
Member Author

We also have the question of whether to overwrite being asked when there is no data to overwrite :)

@andrewtavis
Copy link
Member Author

Closed by #551 🚀🚀🚀🚀🚀 Amazing work, @axif0!

@github-project-automation github-project-automation bot moved this from Todo to Done in Scribe Board Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request help wanted Extra attention is needed Outreachy Available for Outreachy participants
Projects
Archived in project
Development

No branches or pull requests

2 participants