Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in TimeStamp with Time Zones for Arrow #735

Open
EricFecteau opened this issue Jan 25, 2025 · 2 comments
Open

Bug in TimeStamp with Time Zones for Arrow #735

EricFecteau opened this issue Jan 25, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@EricFecteau
Copy link
Collaborator

EricFecteau commented Jan 25, 2025

Issue

The Connector-X code converts timezone data using UTC around here. But it does not seem to work correctly:

Image

This is because, according to arrow docs, you have to have the chrono-tz feature enable to do this and Connector-X does not have this enabled anywhere.

When feature chrono-tz is not enabled, allowed timezone strings are fixed offsets of the form “+09:00”, “-09” or “+0930”. When feature chrono-tz is enabled, additional strings supported by chrono_tz are also allowed, which include IANA database timezones.

Fix

I am currently writing tests for all the types (at least all the types in PostgreSQL at this time) and I can fix this by either using +00:00 instead of UTC (I confirmed it works) or by enabling the chrono-tz in my next PR.

Verification of understanding

Could you please also confirm my understanding.

The Timestamp (without time zone) is mapped to None time zone in Arrow (through the NaiveDateTime in Chrono), because it is not meaningfully useful to think of it as UTC time, as explained in the Arrow docs:

Therefore, timestamp values without a timezone cannot be meaningfully interpreted as physical points in time, but only as calendar / clock indications (“wall clock time”) in an unspecified timezone.

For example, the timestamp value 0 with an empty timezone string corresponds to “January 1st 1970, 00h00” in an unknown timezone: there is not enough information to interpret it as a well-defined physical point in time.

One consequence is that timestamp values without a timezone cannot be reliably compared or ordered, since they may have different points of reference. In particular, it is not possible to interpret an unset or empty timezone as the same as “UTC”.

And while the PostgreSQL docs don't explicitly mention this, they also don't call their Timestamp as having UTC time, but they definitely call their TimestampTz time UTC

For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system's TimeZone parameter, and is converted to UTC using the offset for the timezone zone.

So it makes sense to have the Naive time for Timestamp and the Time Zone time for TimeStampTz!

@EricFecteau EricFecteau added the bug Something isn't working label Jan 25, 2025
@EricFecteau EricFecteau changed the title But in TimeStamp with Time Zones for Arrow Bug in TimeStamp with Time Zones for Arrow Jan 25, 2025
@wangxiaoying
Copy link
Contributor

Hi @EricFecteau , thanks for opening the issue!

I think you are correct. I think we should distinguish naive time and time zone time!

@EricFecteau
Copy link
Collaborator Author

Thanks for the confirmation! To be clear, the implementation is already all there with Timestamp, just had a bug!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants