Wikiquery as a tool to fix issues #121

tholzheim · 2021-02-19T09:12:04Z

tholzheim
Feb 19, 2021
Collaborator

For example: #119
could be easily analyzed with

wikiquery -s or -q "[[Ordinal::+]]"

but this would only give us the page names at this time
Result:

$wikiquery -s or -q "[[Ordinal::+]]" | wc -l
358

If we fix WolfgangFahl/py-3rdparty-mediawiki#56 it should be easy to fix #119.
By applying unix tools such as grep.

WolfgangFahl · 2021-02-19T09:19:52Z

WolfgangFahl
Feb 19, 2021
Maintainer

That is an excellent proposal. I also think that we might be quicker in assessing the relevance of properties this way. Could you check what a query would looke like that finds out how often the Event and Event Series relevant properties have been used in OPENRESEARCH? Applying that ask query and then using wikiquery might give us a speed up in progress. Also it would be a "surrogate" for our lamdba approach we could simply apply a unix pipeline of commands and this would not even be limited to python ...

0 replies

tholzheim · 2021-02-19T09:22:46Z

tholzheim
Feb 19, 2021
Collaborator Author

Which formats should be implemented first?

0 replies

WolfgangFahl · 2021-02-19T09:23:38Z

WolfgangFahl
Feb 19, 2021
Maintainer

csv and json IMHO would best fit with unix tool pipelines.

cut -f

will easily give you the field of a csv or you could use awk
for json there is the tool jq

1 reply

tholzheim Feb 19, 2021
Collaborator Author

I don't know these three but I will try them out.

WolfgangFahl · 2021-02-19T09:25:42Z

WolfgangFahl
Feb 19, 2021
Maintainer

I propose you fix the ticket and then i'll show you examples of all three commands. You'll also find some of the usage of these unix tools in the script that gathers samples for the proceedings title parser. see https://github.com/WolfgangFahl/ProceedingsTitleParser/blob/master/scripts/getsamples

1 reply

tholzheim Feb 19, 2021
Collaborator Author

I found this use case of cut and jq within a one line pipeline

  cat $sampledir/crossref-*.json | jq .message.items[].title | cut -f2 -d'[' | cut -f2 -d'"' | grep -v "]" | tr -s '\n' > $sampledir/proceedings-crossref.txt

https://github.com/WolfgangFahl/ProceedingsTitleParser/blob/b48832e9032e41785f61338f6ff2f5cac91aba0e/scripts/getsamples#L243

Looks like it could be applied straight forward to the results of wikiquery.

tholzheim · 2021-02-19T13:57:41Z

tholzheim
Feb 19, 2021
Collaborator Author

General question: Can invalid property values be queried?
For test purposes I intentionally changed the Ordinal of https://www.openresearch.org/wiki/VLDB_2016 to an invalid value.
An error message is shown in the infobox as hint, that the value is invalid but I can not query the value with wikiquery nor is it shown in the Event series https://www.openresearch.org/wiki/VLDB

3 replies

WolfgangFahl Feb 21, 2021
Maintainer

I am not aware of a native SMW solution for this but I think we might be able to simply use wikibackup and unix tools like awk grep and sed to create a toolchain. Effectively we'd end up with some kind of wikilambda that might use python but also other unix tools. To show this off would be good next step. Feel free to chat online with me on this next time you are available.

tholzheim Mar 11, 2021
Collaborator Author

The property Property:Has improper value for stores all invalid values. This means we can query for all events that have an invalid ordinal:

{{#ask:[[Has improper value for::Ordinal]]|format=ul}}

See https://www.openresearch.org/wiki/Issue:Incorrect_Ordinal

WolfgangFahl Mar 12, 2021
Maintainer

This would also be a good comment for #7 and #71

WolfgangFahl · 2021-02-22T16:12:32Z

WolfgangFahl
Feb 22, 2021
Maintainer

Please try this out together with Musaab on the #119 task. Make sure you try out things in your OPENRESEARCH copies! We'll then later try the same approach e.g. for importing from other sources like dblp.

First step:

 grep "{{Event" *.wiki -A20 | grep "|Ordinal" | cut -f 2 -d "=" | sort -u

These are the values we have:

now we need a "not" condition - find the entries that are not conforming to the regexp for numbers. So instead of using the unix tools above you may simply use python to find the problematic ones. Then do a loop over the .wiki files and fix the Ordinal with the lookup in the dictionary mentioned in the ticket. Then do a wikirestore:

So the way to do this is:

Wikibackup or wikiquery to get relevant pages
Unix-Pipeline to find and fix
Wikirestore to apply fix

1 reply

tholzheim Feb 23, 2021
Collaborator Author

Regarding step 2: I have implemented a python script to detect and change the wrong ordinal numbers.
Similar to https://github.com/WolfgangFahl/ProceedingsTitleParser/blob/7e52b4e3eae09269464669fe387425b9f6392952/ptp/titleparser.py#L488 it takes a dictionary as input for the lookup as mentioned in #119.
The script is executed as follows:
$ ./ordinal_to_cardinal.py -d dictionary.yaml -p orth

Add -f to actually apply the changes.

If the value lookup fails an error message is printed out and the page contend is not changed.
orth/CQR 2008.wiki: Lookup failed! CQR is missing in the dictionary.
What is the desired behavior for invalid ordinal values that can't be resolved?
Like this one https://www.openresearch.org/wiki/CQR_2008

Limitations

Currently the script searches only for the first occurrence of the property Ordinal.

Example execution

$ ./ordinal_to_cardinal.py -d dictionary.yaml -p orth
orth/ICACM 2021.wiki:    4th will changed to 4.
orth/ICNST 2021.wiki:    4th will changed to 4.
orth/ICIM 2021.wiki:     7th will changed to 7.
orth/2021 ICIMP.wiki:    4th will changed to 4.
orth/IEEA 2021.wiki:     10th will changed to 10.
orth/BUSTECH 2021.wiki:  11th will changed to 11.
orth/ICBBT 2021.wiki:    13th will changed to 13.
orth/ICCRE 2021.wiki:    6th will changed to 6.
orth/ICEDA 2021.wiki:    1st will changed to 1.
orth/PATTERNS 2021.wiki:         13th will changed to 13.
orth/AITC 2021.wiki:     3rd will changed to 3.
orth/ALENEX 2019.wiki:   21st will changed to 21.
orth/BAAL 2008.wiki:     41st will changed to 41.
orth/SCAP 2021.wiki:     6th will changed to 6.
orth/ICMAE 2021.wiki:    12th will changed to 12.
orth/ACSS 2021.wiki:     3rd will changed to 3.
orth/ICGET 2021.wiki:    6th will changed to 6.
orth/ICCAI 2021.wiki:    7th will changed to 7.
orth/ICENT 2021.wiki:    3rd will changed to 3.
orth/ICSSE 2021.wiki:    4th will changed to 4.
orth/ADAPTIVE 2019.wiki:         11th will changed to 11.
orth/SPML 2021.wiki:     4th will changed to 4.
orth/AHMIS 2021.wiki:    2nd will changed to 2.
orth/WCSE 2021.wiki:     11th will changed to 11.
orth/CLOUD COMPUTING 2021.wiki:  12th will changed to 12.
orth/ICEAT 2021.wiki:    1st will changed to 1.
orth/ICMHI 2021.wiki:    5th will changed to 5.
orth/CGEEE 2021.wiki:    4th will changed to 4.
orth/ICMEE 2021.wiki:    6th will changed to 6.
orth/ADAPTIVE 2017.wiki:         9th will changed to 9.
orth/CVIT 2021.wiki:     2nd will changed to 2.
orth/CONTENT 2021.wiki:  13th will changed to 13.
orth/ICECT 2021.wiki:    3rd will changed to 3.
orth/INFOCOMP 2021.wiki:         11th will changed to 11.
orth/ICAFM 2021.wiki:    6th will changed to 6.
orth/CCAI 2021.wiki:     1st will changed to 1.
orth/WSSE 2021.wiki:     3rd will changed to 3.
orth/ICBEM 2021.wiki:    11th will changed to 11.
orth/MSIE 2021.wiki:     3rd will changed to 3.
orth/ICBIP 2021.wiki:    6th will changed to 6.
orth/ICBCB 2021.wiki:    9th will changed to 9.
orth/AACL 2008.wiki:     8th will changed to 8.
orth/ICMSSP 2021.wiki:   6th will changed to 6.
orth/ICCEMS 2021.wiki:   6th will changed to 6.
orth/COGNITIVE 2021.wiki:        13th will changed to 13.
orth/ICMESM 2021.wiki:   6th will changed to 6.
orth/SPIES 2021.wiki:    3rd will changed to 3.
orth/ICIEA 2021.wiki:    8th will changed to 8.
orth/ICAER 2021.wiki:    7th will changed to 7.
orth/IECC 2021.wiki:     3rd will changed to 3.
orth/ICIMP 2021.wiki:    16th will changed to 16.
orth/FUTURE COMPUTING 2021.wiki:         13th will changed to 13.
orth/WSCE 2021.wiki:     4th will changed to 4.
orth/ADAPTIVE 2020.wiki:         12th will changed to 12.
orth/ICDMA 2021.wiki:    7th will changed to 7.
orth/ACMME 2021.wiki:    9th will changed to 9.
orth/ICEEA 2021.wiki:    12th will changed to 12.
orth/ICBDA 2021.wiki:    6th will changed to 6.
orth/ICGDA 2021.wiki:    4th will changed to 4.
orth/BDSIC 2021.wiki:    3rd will changed to 3.
orth/ICRES 2021.wiki:    3rd will changed to 3.
orth/IEAI 2021.wiki:     2nd will changed to 2.
orth/ICIAI 2021.wiki:    5th will changed to 5.
orth/ICBDC 2021.wiki:    6th will changed to 6.
orth/BIOTC 2021.wiki:    3rd will changed to 3.
orth/ICoMS 2021.wiki:    4th will changed to 4.
orth/ASIP 2021.wiki:     3rd will changed to 3.
orth/ICFIE 2021.wiki:    8th will changed to 8.
orth/3IA 2009.wiki:      12th will changed to 12.
orth/IVSP 2021.wiki:     3rd will changed to 3.
orth/ICISPC 2021.wiki:   5th will changed to 5.
orth/AICCC 2021.wiki:    4th will changed to 4.
orth/CQR 2008.wiki:      Lookup failed! CQR is missing in the dictionary. 
orth/SERVICE COMPUTATION 2021.wiki:      13th will changed to 13.
orth/ICSED 2021.wiki:    3rd will changed to 3.
orth/ICoIAS'2021.wiki:   4th will changed to 4.
orth/ICCMP 2021.wiki:    7th will changed to 7.
orth/AEEES 2021.wiki:    3rd will changed to 3.
orth/ADIP 2021.wiki:     3rd will changed to 3.
orth/BDE 2021.wiki:      3rd will changed to 3.
orth/ADAPTIVE 2018.wiki:         10th will changed to 10.
orth/ICPAM 2021.wiki:    10th will changed to 10.
orth/COMPUTATION TOOLS 2021.wiki:        12th will changed to 12.

Regarding step 3: Does there already exist a wikirestore tool or is this related to WolfgangFahl/py-3rdparty-mediawiki#44

WolfgangFahl · 2021-02-24T10:58:09Z

WolfgangFahl
Feb 24, 2021
Maintainer

Good work! That is the right direction. Indeed we neeed the wikirestore fix as a prerequisite. Also it would be good to have the wikiquery input "piped" in to the solution. We have to think about a standard way to do this e.g. stdin/stdout using filenames or json info or similar means. When we are done with this step the next step is trying to extract the ordinal from the title as the proceedings title parser does. We could again take a snippet from the source code or give the ptp an API for this.

2 replies

MusaabKh Feb 25, 2021
Collaborator

I was working on extracting the ordinal from the title aside from fixing the ordinal and wanted to make sure of the title will have the number always in the ordinal form or is the cardinal form also expected in the title?

tholzheim Feb 25, 2021
Collaborator Author

Wikirestore should work now. You find a small documentation here https://confident.dbis.rwth-aachen.de/orth/index.php?title=Wikirestore

MusaabKh · 2021-03-02T01:01:58Z

MusaabKh
Mar 2, 2021
Collaborator

wikirestore now has a new functionality of reading input from STD IN. Pipeline to fix ordinal numbers issue is as follows:

wikibackup to get files into the system
grep files where ordinal field exists.
Pipe such files to the following python file:
Python_Ordinal.zip
pipe the output from the python file to wikirestore.
Running command:
grep Ordinal C:/Users/92031/wikibackup/ormk -l -r | python ordinal_to_cardinal.py -stdin -d '../dictionary.yaml' -ro -f | wikirestore -t ormk -stdinp -ui

grep can later be replaced by wikiquery if it returns the page names in STD in.

Wikirestore can also take a listFile argument now and restore all the filenames from the given file.

0 replies

WolfgangFahl · 2021-03-28T10:54:49Z

WolfgangFahl
Mar 28, 2021
Maintainer

Please keep in mind we need to check the types being used in OpenResearch and fix WolfgangFahl/py-3rdparty-mediawiki#5 accordingly. We do not need all types but the relevant ones e.g. bool and geo coordinates might bee needed see also https://github.com/BITPlan/com.bitplan.simplegraph/blob/master/simplegraph-smw/src/main/java/com/bitplan/simplegraph/smw/SmwSystem.java

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wikiquery as a tool to fix issues #121

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Wikiquery as a tool to fix issues #121

tholzheim Feb 19, 2021 Collaborator

Replies: 9 comments · 8 replies

WolfgangFahl Feb 19, 2021 Maintainer

tholzheim Feb 19, 2021 Collaborator Author

WolfgangFahl Feb 19, 2021 Maintainer

tholzheim Feb 19, 2021 Collaborator Author

WolfgangFahl Feb 19, 2021 Maintainer

tholzheim Feb 19, 2021 Collaborator Author

tholzheim Feb 19, 2021 Collaborator Author

WolfgangFahl Feb 21, 2021 Maintainer

tholzheim Mar 11, 2021 Collaborator Author

WolfgangFahl Mar 12, 2021 Maintainer

WolfgangFahl Feb 22, 2021 Maintainer

tholzheim Feb 23, 2021 Collaborator Author

Limitations

Example execution

WolfgangFahl Feb 24, 2021 Maintainer

MusaabKh Feb 25, 2021 Collaborator

tholzheim Feb 25, 2021 Collaborator Author

MusaabKh Mar 2, 2021 Collaborator

WolfgangFahl Mar 28, 2021 Maintainer

tholzheim
Feb 19, 2021
Collaborator

Replies: 9 comments 8 replies

WolfgangFahl
Feb 19, 2021
Maintainer

tholzheim
Feb 19, 2021
Collaborator Author

WolfgangFahl
Feb 19, 2021
Maintainer

tholzheim Feb 19, 2021
Collaborator Author

WolfgangFahl
Feb 19, 2021
Maintainer

tholzheim Feb 19, 2021
Collaborator Author

tholzheim
Feb 19, 2021
Collaborator Author

WolfgangFahl Feb 21, 2021
Maintainer

tholzheim Mar 11, 2021
Collaborator Author

WolfgangFahl Mar 12, 2021
Maintainer

WolfgangFahl
Feb 22, 2021
Maintainer

tholzheim Feb 23, 2021
Collaborator Author

WolfgangFahl
Feb 24, 2021
Maintainer

MusaabKh Feb 25, 2021
Collaborator

tholzheim Feb 25, 2021
Collaborator Author

MusaabKh
Mar 2, 2021
Collaborator

WolfgangFahl
Mar 28, 2021
Maintainer