Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OGRLayer::GetArrowStream(): add a DATETIME_AS_STRING=YES/NO option #11213

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rouault
Copy link
Member

@rouault rouault commented Nov 5, 2024

DATETIME_AS_STRING=YES/NO. Defaults to NO. Added in GDAL 3.11. Whether DateTime fields should be returned as a (normally ISO-8601 formatted) string by drivers. The aim is to be able to handle mixed timezones (or timezone naive values) in the same column. All drivers must honour that option, and potentially fallback to the OGRLayer generic implementation if they cannot (which is the case for the Arrow, Parquet and ADBC drivers).
When DATETIME_AS_STRING=YES, the TIMEZONE option is ignored.

Fixes geopandas/pyogrio#487

@coveralls
Copy link
Collaborator

coveralls commented Nov 6, 2024

Coverage Status

coverage: 73.629% (+0.003%) from 73.626%
when pulling ad41bb8 on rouault:GetArrowStream_DATETIME_AS_STRING
into c578e9d on OSGeo:master.

DATETIME_AS_STRING=YES/NO. Defaults to NO. Added in GDAL 3.11.
Whether DateTime fields should be returned as a (normally ISO-8601
formatted) string by drivers. The aim is to be able to handle mixed
timezones (or timezone naive values) in the same column.
All drivers must honour that option, and potentially fallback to the
OGRLayer generic implementation if they cannot (which is the case for the
Arrow, Parquet and ADBC drivers).
When DATETIME_AS_STRING=YES, the TIMEZONE option is ignored.

Fixes geopandas/pyogrio#487
@rouault rouault force-pushed the GetArrowStream_DATETIME_AS_STRING branch from ae77c2f to ad41bb8 Compare November 6, 2024 01:26
@rouault
Copy link
Member Author

rouault commented Nov 6, 2024

@theroggy @jorisvandenbossche I'm thinking that in this DATETIME_AS_STRING=YES mode, in the ArrowSchema of datetime fields exposed as string (format='u'), we should probably also set the metadata field with a hint for the DateTime semantics. Any suggestion of an appropriate value for it?

@jorisvandenbossche
Copy link
Contributor

Thanks a lot for looking into this!

we should probably also set the metadata field with a hint for the DateTime semantics. Any suggestion of an appropriate value for it?

Would you just want to indicate that the original GDAL/OGR type was a DateTime? Or is there more information about the column that GDAL can know at that point?
For the type, maybe something like "gdal:type": "DateTime" ? (there is not yet any precedence where you store some information like this is any file format?)

@rouault
Copy link
Member Author

rouault commented Nov 6, 2024

Would you just want to indicate that the original GDAL/OGR type was a DateTime?

actually, I'm just remembering that we have already something. https://gdal.org/en/latest/doxygen/classOGRLayer.html#a3ffa8511632cbb7cff06a908e6668f55 mentions:

Starting with GDAL 3.8, the ArrowSchema::metadata field filled by the get_schema() callback may be set with the potential following items:
    "GDAL:OGR:alternative_name": value of OGRFieldDefn::GetAlternativeNameRef()
    "GDAL:OGR:comment": value of OGRFieldDefn::GetComment()
    "GDAL:OGR:default": value of OGRFieldDefn::GetDefault()
    "GDAL:OGR:subtype": value of OGRFieldDefn::GetSubType()
    "GDAL:OGR:width": value of OGRFieldDefn::GetWidth() (serialized as a string)
    "GDAL:OGR:unique": value of OGRFieldDefn::IsUnique() (serialized as "true" or "false")
    "GDAL:OGR:domain_name": value of OGRFieldDefn::GetDomainName()

Those are only filled when they cannot be expressed with an Arrow concept.
So logically that should be extended with "GDAL:OGR:type": "DateTime" in that situation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
funded through GSP Work funded through the GDAL Sponsorship Program
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Differences in how datetime columns are treated with arrow=True
3 participants