Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELAN vocabulary generation script throws a SAXParseException #304

Open
abdelker opened this issue Jun 21, 2023 · 4 comments
Open

ELAN vocabulary generation script throws a SAXParseException #304

abdelker opened this issue Jun 21, 2023 · 4 comments

Comments

@abdelker
Copy link

Hello,

I am trying to run the [./] ecv.sh script (-branch master -folder scripts).
However, it fails to parse the owl files.

Here is the error I get:

Traceback (most recent call last):
  File "/usr/lib/python3.8/xml/sax/expatreader.py", line 217, in feed
    self._parser.Parse(data, isFinal)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 6

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/abdelker/catkin_ws/src/soma/scripts/elan_cv.py", line 231, in <module>
    ontoP, namespaces = parseXMLOnto(onto)
  File "/home/abdelker/catkin_ws/src/soma/scripts/elan_cv.py", line 152, in parseXMLOnto
    ontoP = untangle.parse(onto)
  File "/home/abdelker/.local/lib/python3.8/site-packages/untangle.py", line 205, in parse
    parser.parse(filename)
  File "/usr/lib/python3.8/xml/sax/expatreader.py", line 111, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python3.8/xml/sax/xmlreader.py", line 125, in parse
    self.feed(buffer)
  File "/usr/lib/python3.8/xml/sax/expatreader.py", line 221, in feed
    self._err_handler.fatalError(exc)
  File "/usr/lib/python3.8/xml/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: /home/abdelker/catkin_ws/src/soma/scripts/../owl/SOMA.owl:1:6: not well-formed (invalid token)
@mpomarlan
Copy link
Collaborator

This seems related to switching SOMA's format to owl functional syntax. The script assumed RDF/XML and was not updated in a while.

I can take care of this later in the week, or more likely next week. As a quick workaround, Protege allows saving SOMA.owl as RDF/XML and then the script should run with that as input.

@mrnolte
Copy link
Collaborator

mrnolte commented Jun 21, 2023

Can you explain to me what the ELAN scripts are doing? We might want to move these to the java CI as well.

@mpomarlan
Copy link
Collaborator

The ELAN script generates vocabulary files.

In more detail, it loops through the concepts in SOMA looking for whether a particular annotation property (ELANName I think) is defined for a concept. If it is, then an entry is generated into a controlled vocabulary file.

Later, when someone uses ELAN to annotate, they rely on having such vocabulary files to provide labels for the annotations.

Moving the functionality of the script to the Java CI is a good idea since Java has all the library support for OWL formats.

@mrnolte
Copy link
Collaborator

mrnolte commented Jul 28, 2023

@ayden175 I know that you want to work on other stuff, but would you mind checking this problem out? If not, I can also have a look at changing the CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants