Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature MIVOT (Model Instance in VOTable) #497

Merged
merged 56 commits into from
May 9, 2024

Conversation

somilia
Copy link
Contributor

@somilia somilia commented Nov 3, 2023

Processing VO Model Annotations

Introduction

Model Instances in VOTables (MIVOT) defines a syntax for mapping VOTable data to any model serialized in VO-DML (Virtual Observatory Data Modeling Language).

This annotation schema acts as a bridge between data and models. It associates both column/parameter metadata and VOTable data with data model elements (such as classes, attributes, types, etc.). It also complements VOTable data or metadata that may be missing from the table, e.g., coordinate system descriptions or curation tracing.

The data model elements are organized in an independent annotation block complying with the MIVOT XML schema, which is added as an extra resource above the TABLE element. The MIVOT syntax allows a data structure to be described as a hierarchy of classes. It can also represent relationships and compositions between them. Furthermore, it can construct data model objects by aggregating instances from different tables of the VOTable.

Usage

The API allows you to obtain different levels of model views on the last read data row. These levels are described below.
The lowest levels are model agnostic.

They provide tools to browse model instances dynamically generated. The understanding of the model elements is the responsibility of the final user.

The highest level (4) is based on the MANGO draft model and especially to its. It has been designed to solve the EpochPropagation use case risen at 2023 South Spring Interop.

The end user API allows to obtain a model view on the last read data row, this usage corresponds to the level 3 and 4 described below.

activate_features('MIVOT')
data_path = os.path.dirname(os.path.realpath(__file__))
votable = os.path.join(data_path, "tests/data/simple-annotation-votable.xml")

m_viewer = ModelViewerLevel1(votable)
row_view = m_viewer.get_next_row_view()

print(row_view.longitude.value)

You can access each value of the object of the model. e.g.:
>>> print(row_view.Coordinate_coosys.PhysicalCoordSys_frame.spaceRefFrame.value)
>>> ICRS

The model view is a dynamically generated Python object whose field names are derived from the dmroles of the MIVOT elements. There is no checking against the model structure at this level.

In this pull request, JOIN and dynamic references are not implemented. We keep focused on simpler patterns: The MIVOT block maps one mapped data table with the coordinate systems in the GLOBALS

PyVO Implementation

The implementation relies on the Astropy's write and read annotation modules (PR#15390) available from astropy 6.0, which allows to get and set Mivot blocks from/into VOTables. We use this new Astropy feature, MIVOT, to retrieve the MIVOT block.
This implementation is built in 3 levels, denoting the abstraction level in relation to the XML block.

Level 1: ModelViewerLevel1

Provide the MIVOT block as it is in the VOTable: No references are resolved. The Mivot block is provided as an xml tree.

Level 2: ModelViewerLevel2

Provide access to an xml tree whose structure matches the model view of the current row. The internal references have been resolved (by _get_model_view() function of the ModelViewerLevel1). The attribute values have been set with the actual data values. This XML element is intended to be used as a basis for building any objects. The level2 output can be browsed using XPATH queries allowing users to retrieve MIVOT elements by their @dmrole or @dmtype. At this level, the MIVOT block must still be handled as an xml element.

Level 3: ModelViewerLevel3

ModelViewerLevel3 generates, from the level 1 output, a nested dictionary representing the entire XML INSTANCE with its hierarchy.

Level 4: MivotClass

From this dictionary, we build a ~pyvo.mivot.viewer.mivot_class.MivotClass object, which is a dictionary containing only the essential information used to process data.
MivotClass basically stores all XML objects in its attribute dictionary __dict__.

Level stacking

The more levels we have , the more elements are hidden:

  • Hide column parsing: column identifiers in row view are replaced by model object.
    (e.g. get_attribute_by_role(coords:LonLatPoint.longitude))
  • Hide MIVOT identifiers: MIVOT XML identifiers (INSTANCE, COLLECTION, ATTRIBUTE) are replaced by model elements.
    (e.g. EpochPosition.longitude.value)
  • Hide model elements: model elements are replaced by library instances.
    (e.g. EpochPosition.get_sky_coord())
Level Column parsing MIVOT Identifiers Models elements
Level 1 & 2 Hidden Showed Showed
Level 3 Hidden Hidden Showed
Level 4 Hidden Hidden Hidden

Mivot package content:

Utils sub-package

This package contains modules that make it easier to handle XML elements and dictionaries, as well as the logger setup.

Seeker sub-package

  • AnnotationSeeker provides a set of tools for extracting mapping sub-blocks to retrieve XML elements.
  • RessourceSeeker provides a set of getters for tables.
  • TableIterator is a simple wrapper that iterates over table rows.

Feature sub-package

This package contains features such as StaticReferenceResolver which is used to resolve references (replace REFERENCE elements with the referenced objects set with the roles of the REFERENCEs). Future features to be added to this sub-package include DynamicReference and Join.

Exception sub-package

This package contains exception classes related to MIVOT.

Viewer sub-package

This package contains all the levels of ModelViewer described above, as well as the MivotClass file.

Test strategy:

The test module contains one pytest file per Python module, and the datasets used for the tests are located in the 'test/data' directory. The tests check for values, intermediate objects (dictionaries), errors, and thrown exceptions.

EDIT:
Few changes have been made lastly:

  • ModelViewerLayer have been renamed by ModelViewerLevel.
  • lxml dependency have been removed, and replaced by defusedxml if available or by native xml package.
  • The get_next_row_view() function now allows to pass to the next row, returning the MivotClass with the new data row.
  • MivotClass has been added to the ModelViewerLevel1, in order to keep the same instance with only changing values that have a reference.
  • EpochPropagation have been improved, it can give an ~astropy.coordinates.sky_coordinate.SkyCoord containing only the data available in the row.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some generic comments that may help going forward with this PR.

Besides that, the big picture comment would be that I feel there a a lot of methods/properties/attributes, are all of those necessary for the users or the code could be significantly simplyfied?

docs/mivot/features.rst Outdated Show resolved Hide resolved
pyvo/mivot/__init__.py Outdated Show resolved Hide resolved
pyvo/mivot/features/static_reference_resolver.py Outdated Show resolved Hide resolved
pyvo/mivot/features/static_reference_resolver.py Outdated Show resolved Hide resolved
pyvo/mivot/seekers/annotation_seeker.py Outdated Show resolved Hide resolved

@author: laurentmichel
"""
from lxml import etree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only the a test file, so should not matter that much, but I recall the question of whether use lxml or defusedxml was raised in connection with this mivot work.

Also, either of the new dependencies is being used, they need to be added properly as a dependency (whether they will be mandatory or optional one is open for discussion, but at least they have to be added as a test dependency)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, lxml is used by the model_viewer module which is unavoidable. So whatever parser we will be using, we will have to add a new dependency. See slack thread.

There is here a long discussion about lxml vs defused xml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have replaced lxml. xpath queries with code based on the built-in xml parser.
So that we do not need the lxml dependency anymore.
Unfortunately, the Python xml parser is not safe either.
It must be replaced with defusedxml if possible.
Our code has been setup to work with defusedxml if available or to keep working with the built-in xml parser otherwise.

pyvo/mivot/utils/xml_utils.py Outdated Show resolved Hide resolved
@lmichel
Copy link
Contributor

lmichel commented Nov 9, 2023

Thank you for the review of this big code chunk.

  • This message answers your global comment.
  • The others will be applied to our code with the next push.

We're well aware that we're arriving with a code package that far exceeds the usual PR size.
This is why, and after discussion with @tomdonaldson, we are starting with a draft PR in order to run a smooth integration within PyVO.
Most of our code is used behind the scene. The user shouldn't care about it.
The question is to know where to locate and how to organize all of that logic.
This has been written with my own coding style (one module per class or so , modules grouped in folders by feature).
This is may be not appropriate for PyVO, thus we would enjoy any advice adapting or improving our architecture.

@bsipocz
Copy link
Member

bsipocz commented Nov 9, 2023

Most of our code is used behind the scene. The user shouldn't care about it.

Then add a _ in front of the method name to indicate they are private. It will open up the possibility of easier refactor in the future without going though a deprecation cycle.

Copy link

codecov bot commented Nov 15, 2023

Codecov Report

Attention: Patch coverage is 86.91489% with 123 lines in your changes are missing coverage. Please review.

Project coverage is 81.24%. Comparing base (ded4360) to head (befc7bc).
Report is 14 commits behind head on main.

❗ Current head befc7bc differs from pull request most recent head 57a3885. Consider uploading reports for the commit 57a3885 to get more accurate results

Files Patch % Lines
pyvo/mivot/utils/xml_utils.py 57.14% 27 Missing ⚠️
pyvo/mivot/viewer/mivot_viewer.py 89.12% 26 Missing ⚠️
pyvo/mivot/seekers/annotation_seeker.py 94.96% 8 Missing ⚠️
pyvo/mivot/viewer/mivot_instance.py 89.61% 8 Missing ⚠️
pyvo/mivot/seekers/resource_seeker.py 86.79% 7 Missing ⚠️
pyvo/mivot/utils/json_encoder.py 36.36% 7 Missing ⚠️
pyvo/mivot/utils/dict_utils.py 73.91% 6 Missing ⚠️
pyvo/mivot/utils/logger_setup.py 81.81% 6 Missing ⚠️
pyvo/mivot/utils/vocabulary.py 90.76% 6 Missing ⚠️
pyvo/mivot/viewer/xml_viewer.py 89.28% 6 Missing ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #497      +/-   ##
==========================================
+ Coverage   80.38%   81.24%   +0.86%     
==========================================
  Files          52       69      +17     
  Lines        6189     7129     +940     
==========================================
+ Hits         4975     5792     +817     
- Misses       1214     1337     +123     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lmichel
Copy link
Contributor

lmichel commented Jan 31, 2024

After some testing achieved with Vizier people by using the service they have deployed, it turns out that the a few tweaks must be fixed :

  • Better connection of the model viewers with the DALResults object.
  • few more unit tests

In addition a documentation page must be written with code examples.
This will be pushed on the source branch with the next commit.

@lmichel
Copy link
Contributor

lmichel commented Feb 5, 2024

After some testing achieved with Vizier people by using the service they have deployed, it turns out that the a few tweaks must be fixed :

* Better connection of the model viewers with the DALResults object.

* few more unit tests

In addition a documentation page must be written with code examples. This will be pushed on the source branch with the next commit.

Done (fork CI passed)

@bsipocz
Copy link
Member

bsipocz commented Feb 6, 2024

@astropy/coordinators - please do add the contributors from this PR to the org, so CI will run here. And it would be nice to have a better, more automated way to do this.

@pllim
Copy link
Member

pllim commented Feb 6, 2024

Sure, but there are a lot of conversations to comb through here, can you please give me the usernames?

Also, such requests can also be made via https://github.com/astropy/astropy-project/blob/main/.github/ISSUE_TEMPLATE/github-admin.yaml in the future. Thanks!

@pllim
Copy link
Member

pllim commented Feb 6, 2024

p.s. We tried to automate but it's broken. Maybe @mwcraig would have time to fix it at some point.

@dhomeier
Copy link

dhomeier commented Feb 6, 2024

Sure, but there are a lot of conversations to comb through here, can you please give me the usernames?

Aren't they in the commits list? (and sorry, I don't know where/how to add them to the org myself)

@pllim
Copy link
Member

pllim commented Feb 6, 2024

There are 103 commits. I cannot quickly figure out without combing through them.

@mwcraig
Copy link
Member

mwcraig commented Feb 6, 2024

I think @somilia is the only one that needs to be added -- it is the fact that the person who opened the PR isn't part of the org that causes the issue IIRC.

@mwcraig
Copy link
Member

mwcraig commented Feb 6, 2024

Looks like @lmichel is the only other committer...

@dhomeier
Copy link

dhomeier commented Feb 6, 2024

I only saw 2 (@somilia and @lmichel), but if we need co-authors, yes, parsing the messages is out of bounds.

@pllim
Copy link
Member

pllim commented Feb 6, 2024

Thanks! I added both to astropy-contributors.

@lmichel
Copy link
Contributor

lmichel commented Feb 7, 2024 via email

@pllim
Copy link
Member

pllim commented Feb 7, 2024

Ah, those approvals. That is controlled by GitHub, not the org. As a rule to deter spammers, GitHub says workflows do not automatically run unless you have a merged commit in the repo already.

https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks

By default, all first-time contributors require approval to run workflows.

@pllim
Copy link
Member

pllim commented Feb 7, 2024

To make things easier for everyone (e.g., not having to wait around for such approvals in this PR), you can run the test suite locally instead if you are unsure. And hopefully when this gets merged, your next PR will no longer require such approvals. Good luck!

p.s. +6,214 −4

@bsipocz
Copy link
Member

bsipocz commented Feb 8, 2024

By default, all first-time contributors require approval to run workflows.

For which the workaround suggested by GH was to add the contributor to the org. Maybe they changed policies since that discussion a while back.

ps: yes, the size is one of the reasons why it takes this long to get the reviews/etc going.

@lmichel
Copy link
Contributor

lmichel commented Feb 8, 2024 via email

@pllim
Copy link
Member

pllim commented Feb 8, 2024

Well, I did add lmichel to astropy-contributors but didn't seem to help. I dunno if the addition has to be done before this PR was opened or the workaround assumption was wrong. 🤷

I guess another workaround is for lmichel to do a very trivial fix somewhere else in this repo and get that PR merged first. (Unproven.)

@bsipocz
Copy link
Member

bsipocz commented Feb 8, 2024

I guess another workaround is for lmichel to do a very trivial fix somewhere else in this repo and get that PR merged first. (Unproven.)

That also worked before. But as Matt said above, it might need to be Somia to do these as they opened the PR (though I would be surprised if one of us maintainers to push to it at that commit wouldn't trigger the CI).

Anyway, at this point I would say wait for the review, and then we try to wrap it up as smoothly as possible. No point in trying to split it up into multiple PRs at this point, etc. And a rebase and some targeted squashing will clean up the history significantly, too, but I would wait with that after the content review, too.

@pllim
Copy link
Member

pllim commented Feb 8, 2024

FWIW I also added somilia to the same astropy-contributors team at the same time as lmichel . Anyways, sorry I couldn't help. If you need to rerun CI , feel free to ping me. I am usually responsive when I am working (USA New York time).

@bsipocz
Copy link
Member

bsipocz commented Feb 8, 2024

They both contributed in the big feature PR to core in astropy/astropy#15390, so maybe some policy to automatically add people to the org who contribute significantly would have prevented this issue.

(note: I do not advocate for inviting all who adds a single character doc fix to the org, but here we are talking about a significant enhancement (to core astropy) that has a follow-up (this PR) that runs into technical gates as people are automatically put into newcomer bins while there are not at all newcomers to the org).

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super tiny comments after the rebase. I'll push a fix for these along with adding back the changelog entry.

pyvo/conftest.py Outdated
Comment on lines 45 to 48
try:
PYTEST_HEADER_MODULES['defusedxml'] = 'defusedxml'
except (NameError, KeyError):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for try/except

Suggested change
try:
PYTEST_HEADER_MODULES['defusedxml'] = 'defusedxml'
except (NameError, KeyError):
pass
PYTEST_HEADER_MODULES['defusedxml'] = 'defusedxml'

Comment on lines 95 to 96
if __name__ == '__main__':
pytest.main()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for these in any of the files

@bsipocz
Copy link
Member

bsipocz commented May 7, 2024

CI failure is unrelated and I'll fix it separately.

lmichel and others added 14 commits May 6, 2024 22:51
class. Comments and documentation have been updated. Public methods have
been renamed to make the API easier to understand.
does not support all MIVOT features and which ones are not supported
…passing volint 3) re-wording 4) spurious files removed 5) remove unused stuff form vocabulary
…3) replace NotImplementedException with built-in error 4) logging message to debug level 5) type corrected 6) Mivot_Viewer (_)instance renamed as dm_instance 7) docstrings 8) similar Xpath methods refactored 9) table_name renamed 10) docstring 11) Replace @Property when get_ methods 12) MivotViewer imported from mivot._init__ 13) 3 exception classes 14) add a __repr__ for MivotInstance 14) docstring completed 15) Activate MIVOT feature at package import 16) json.load documentation 17) tests against MivotException types and messages 18) Use @pytest.mark.skipif 19) Astropy version checked at module loading 20) MivotViewer automatically select the  resource 21) Class JSONEncoder renamed  22) Add a slim mode to the class dict generator 23) Documentation  24) code style 25) doc highlight on dict serialization 26) doc string indentation fixed
Copy link
Contributor

@tomdonaldson tomdonaldson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this feature is ready to merge, and I'm pleased to approve it. It will be important to have this basis in place for future use and evolution of these capabilities. Thanks to all who worked on this, especially @somilia and @lmichel!

@tomdonaldson
Copy link
Contributor

We'll hold off on merging pending a final update to CHANGES.rst.

Mentions to MANGO and EpochPropagation removed since these features have been postponed.
@bsipocz
Copy link
Member

bsipocz commented May 9, 2024

CI failure is unrelaed

@tomdonaldson tomdonaldson merged commit bfd75c9 into astropy:main May 9, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants