Replies: 9 comments 8 replies
-
That is an excellent proposal. I also think that we might be quicker in assessing the relevance of properties this way. Could you check what a query would looke like that finds out how often the Event and Event Series relevant properties have been used in OPENRESEARCH? Applying that ask query and then using wikiquery might give us a speed up in progress. Also it would be a "surrogate" for our lamdba approach we could simply apply a unix pipeline of commands and this would not even be limited to python ... |
Beta Was this translation helpful? Give feedback.
-
Which formats should be implemented first? |
Beta Was this translation helpful? Give feedback.
-
csv and json IMHO would best fit with unix tool pipelines.
will easily give you the field of a csv or you could use awk |
Beta Was this translation helpful? Give feedback.
-
I propose you fix the ticket and then i'll show you examples of all three commands. You'll also find some of the usage of these unix tools in the script that gathers samples for the proceedings title parser. see https://github.com/WolfgangFahl/ProceedingsTitleParser/blob/master/scripts/getsamples |
Beta Was this translation helpful? Give feedback.
-
General question: Can invalid property values be queried? |
Beta Was this translation helpful? Give feedback.
-
Please try this out together with Musaab on the #119 task. Make sure you try out things in your OPENRESEARCH copies! We'll then later try the same approach e.g. for importing from other sources like dblp. First step:
These are the values we have:
now we need a "not" condition - find the entries that are not conforming to the regexp for numbers. So instead of using the unix tools above you may simply use python to find the problematic ones. Then do a loop over the .wiki files and fix the Ordinal with the lookup in the dictionary mentioned in the ticket. Then do a wikirestore: So the way to do this is:
|
Beta Was this translation helpful? Give feedback.
-
Good work! That is the right direction. Indeed we neeed the wikirestore fix as a prerequisite. Also it would be good to have the wikiquery input "piped" in to the solution. We have to think about a standard way to do this e.g. stdin/stdout using filenames or json info or similar means. When we are done with this step the next step is trying to extract the ordinal from the title as the proceedings title parser does. We could again take a snippet from the source code or give the ptp an API for this. |
Beta Was this translation helpful? Give feedback.
-
wikirestore now has a new functionality of reading input from STD IN. Pipeline to fix ordinal numbers issue is as follows:
grep can later be replaced by wikiquery if it returns the page names in STD in. Wikirestore can also take a listFile argument now and restore all the filenames from the given file. |
Beta Was this translation helpful? Give feedback.
-
Please keep in mind we need to check the types being used in OpenResearch and fix WolfgangFahl/py-3rdparty-mediawiki#5 accordingly. We do not need all types but the relevant ones e.g. bool and geo coordinates might bee needed see also https://github.com/BITPlan/com.bitplan.simplegraph/blob/master/simplegraph-smw/src/main/java/com/bitplan/simplegraph/smw/SmwSystem.java |
Beta Was this translation helpful? Give feedback.
-
For example: #119
could be easily analyzed with
but this would only give us the page names at this time
Result:
If we fix WolfgangFahl/py-3rdparty-mediawiki#56 it should be easy to fix #119.
By applying unix tools such as grep.
Beta Was this translation helpful? Give feedback.
All reactions