-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Feature to Extract and Verify All Grammatical Features for a Data Type in a Given Language #513
Comments
Thanks for making the issue, @OmarAI2003! I'll have more information on this in the coming weeks :) |
@axif0, now that we have the all forms functionality for Wikidata lexeme dumps, do you want to start working on the check for this? Basically we'd want a check that gets all the forms for all languages and then compares them against what we have in the queries. If the queries are missing forms, when we'd throw an error 😊 Ideally we'd have this also be able to be triggered manually. |
Thank you for bringing this up! We can start working on the check for this functionality. To clarify, are you suggesting that if any forms are missing in the queries, we should throw an error rather than just issuing a warning? |
I would say that ideally what would come from this is a GitHub workflow that would actually error and on error open a PR with the corrected query with the missing forms. That way the work of actually writing the queries is taken care of for us and we can just review when the queries are written 😊 |
Automating the process with a GitHub workflow that not only identifies the missing forms but also opens a PR with the corrected queries would indeed save a lot of time and effort. Working on it! |
Terms
Description
The
language_data_extraction
directory organizes supported languages into folders, with each language folder containing subfolders for supported data types (e.g., nouns, verbs, adverbs). Within these subfolders, SPARQL files are used to fetch lexical data for grammatical features. One way to enhance the data extraction process is to implement a mechanism that tracks the forms for each data type directly from Wikidata.Problem Statement
Currently, we face two key challenges:
Addressing these challenges is essential for accurately capturing all forms of a data type across languages, ultimately improving data quality and consistency.
Contribution
No response
The text was updated successfully, but these errors were encountered: