Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support XML Schema Definition Language (XSD) import #153

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

multimeric
Copy link

Adds an XsdImportEngine, with tests.

XSD doesn't map cleanly to LinkML, because:

  • XML has both attributes and child elements. I treat both as slots, but tag each with a keyword that indicates which one it is
  • The root element is a pseudo class, because it may enforce a specific structure. I resolve this by adding a RootElement class where necessary

@sierra-moxon sierra-moxon self-requested a review January 15, 2025 00:46
@sierra-moxon
Copy link
Member

Thank you @multimeric! I think there is a lot of interest in an XML -> LinkML importer.

@cmungall
Copy link
Member

Thank you!!

Re lack of isomorphism. See also https://stackoverflow.com/questions/191536/converting-xml-to-json-using-python which mentions a "standard" (for instance-level). If there is nothing more up to date, I suggest being consistent with this, such that xml --[xmltodict]--> json validates via [xml schema]-->linkml

Or at least an option to do this - for now just marking this in the docs is sufficient

@multimeric
Copy link
Author

multimeric commented Jan 15, 2025

Hmm that's an interesting suggestion. However the proposed solution is to prepend @ to attribute names such that:

<p id="1">text</p>

Becomes

{
  "p": {
    "@id": 1,
    "$": "text"
  }
}

I think this is a bit ugly, and also perhaps easily confused with JSON-LD, but I will implement it that way if you prefer.

I consider this scenario a bit different from standard XML to JSON conversion because here we have a schema that can describe which slots are attributes directly. So I'm wondering fi there's a good field in the LinkML SlotDefinition that would capture this (current I'm using keywords).

@cmungall
Copy link
Member

And now I reflect this you are right, this would make things quite complicated. Having the @ at the schema level isn't permitted in linkml, so there would need to be a mapping preserved... all quite ugly.

So scratch that, let's keep it in mind for future extensions.

Formally the way to do this in linkml would be to use conforms_to or instantiates

slots:
  id:
    conforms_to: xsd:attribute
  description:
    conforms_to: xsd:attribute

or

slots:
  id:
    instantiates: [xsd:attribute]
  description:
    instantiates: [xsd:attribute]

The former is stronger and could be used for validation if we later implement metaclasses for xml schema (see https://linkml.io/linkml/schemas/annotations.html#validation-of-annotations).

(a to be defined xml schema metamodel):

class:
  XmlAttribute:
    is_a: slot
    class_uri: xsd:attribute
    description: If a slot instantiates this then it should be mapped to an attribute when serializing to XML

However, if you want to keep your method flexible such that you can pass in a profile on the command line (e.g. using keywords) and satisfy your use case first that's valid!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants