Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FromSql/ToSql<Timestamp> to Pg backend for OffsetDateTime #3973

Closed
wants to merge 1 commit into from

Conversation

jeff-hiner
Copy link

Addresses #3581

@weiznich weiznich requested a review from a team March 29, 2024 17:26
@weiznich
Copy link
Member

Thanks for opening this PR. This is really appreciated.
Can you elaborate on how to solve potential issues around time-zones that might occur here, especially which time zone should be chosen for the offset and whether that choice is always sensible?

Copy link
Member

@Ten0 Ten0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not impl ToSql/FromSql<Timestamp, _> for OffsetDateTime, see below for in-depth explanations.

I would however be ok with an impl ToSql/FromSql<Timestamptz, _> for OffsetDateTime

@@ -44,6 +49,23 @@ impl ToSql<Timestamp, Pg> for PrimitiveDateTime {
}
}

// Delegate offset datetimes in terms of UTC primitive datetimes; this stores everything in the DB as UTC
#[cfg(all(feature = "time", feature = "postgres_backend"))]
impl ToSql<Timestamp, Pg> for OffsetDateTime {
Copy link
Member

@Ten0 Ten0 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OffsetDateTime is timezone-aware, Timestamp isn't.

OffsetDateTime should be allowed to map to Timestamptz.
PrimitiveDateTime should be allowed to map to Timestamp.

However, implementing ToSql<Timestamp, _> for OffsetDateTime would let users implicitly drop timezone information by writing table::non_timezone_aware_field.eq(timezone_aware_data), which is bug-prone.
This goes strongly against the general principle of non-implicit-cast otherwise used in Rust.

=> I would prefer that we do not do this, and let users write table::non_timezone_aware_field.eq(timezone_aware_data.forget_offset()) if they really want to - which they probably don't: they probably want to update their database schema to be timezone-aware instead (that is, by using the Timestamptz type instead of Timestamp, if they are feeding actually feeding timezone-aware data to their database).

NB: I couldn't find an actual OffsetDateTime::forget_offset function in time (or one named differently that would do that), which I find very surprising (I don't use time), but that would be an issue to solve via a PR in time, not in diesel. (But then maybe such a function doesn't exist because the behavior of a function with a such signature may be unclear because choosing whether to convert by going to UTC or by forgetting offset is arbitrary.) (In any case you can write PrimitiveDateTime::new(dt.date(), dt.time()) until that gets merged)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While strictly true, this winds up being incredibly verbose in practice. In particular where users of diesel don't control their own schema, it's convenient to specify the UTC convention at the row mapping. This is probably why the MySql impl exists.

When deriving Queryable for large structs on Pg you currently have to write the field mapping as PrimitiveDateTime. When doing almost anything with the data later (including serializing for REST output) you wind up with a lot of assume_utc() calls. This becomes confusing to read, because "hey this Postgres time is in UTC" annotation is separated from the database call that populated the PrimitiveDateTime.

In the other direction when storing rows, you wind up having to write wrapper fns to manually convert OffsetDateTime into PrimitiveDateTime for each and every insertion. This is easy to get wrong: if you simply call date() and time() without first explicitly converting to UTC you can easily wind up inadvertently storing an incorrect timestamp. There is no impl in time to perform this conversion, making it almost deliberately awkward. I assume the time crate author has a philosophical reasoning here, in that most people should store and operate on OffsetDateTime instead of PrimitiveDateTime.

Copy link
Member

@Ten0 Ten0 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case where you don't control your database schema, I think you have two reasonable options:

  • Use serialize_as/serialize_fn/deserialize_as/deserialize_fn on your Insertable/Queryable struct - this removes the need for a conversion at every struct construction (resp. usage)
  • Since it turns out that the actual wire serialization for the Timestamptz type is precisely that of serializing a Timestamp as Utc (cf the module this PR updates), you could update your schema.rs to replace Timestamp by Timestamptz there after the CLI generates it, in effect asserting that when a Timestamp is used in this table, it is indeed expected to be an UTC timestamp. You would then appropriately use timezone-aware throughout your rust application, guaranteeing that these are read and inserted as such, and avoiding non-timezone-aware types in your application (which is best in any case).

This is probably why the MySql impl exists

As outlined by weiznich the reason is more likely that MySQL has a different meaning than Postgres for the Timestamp type, where it is actually timezone-aware (in that it converts provided time from local to UTC for storage):
https://dev.mysql.com/doc/refman/8.0/en/datetime.html

So it looks like on the MySQL side, Timestamp should be allowed to map from timezone-aware types, and DateTime should be allowed to map from non-timezone-aware types. (It looks like it may be possible to extract the appropriate information to behave properly in every case from the MysqlTime that gets deserialized. I'm surprised that we assume utc instead of reading time_zone_displacement (doc: https://dev.mysql.com/worklog/task/?id=10828) but that may be correct.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I didn't realize the Timestamp behavior was that different, but I guess it does make sense here.

you could update your schema.rs to replace Timestamp by Timestamptz there after the CLI generates it, in effect asserting that when a Timestamp is used in this table, it is indeed expected to be an UTC timestamp.

This is a really clever approach. I was able to make this switch in my own code, and things generally worked as you've described. The only place I ran into issues was with a gt comparison against now... to me that indicates the NOW() typing in Postgres actually isn't correct either. The Postgres documentation seems to indicate that the type returned should be Timestamptz and not naked Timestamp. I was able to work around it by doing a local now_utc and doing a local subtraction, but probably something to add to a corrections list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC now has AsExpression implementations for both Timestamp and Timestamptz, so one shouldn't hit any error if .into_sql()ing it to Timestamptz. I'm surprised that doesn't work implicitly anymore... It's maybe a matter of Nullable/not Nullable...

@@ -17,6 +17,11 @@ use crate::sql_types::{Date, Time, Timestamp, Timestamptz};
// Postgres timestamps start from January 1st 2000.
const PG_EPOCH: PrimitiveDateTime = datetime!(2000-1-1 0:00:00);

fn to_primitive_datetime(dt: OffsetDateTime) -> PrimitiveDateTime {
let dt = dt.to_offset(UtcOffset::UTC);
PrimitiveDateTime::new(dt.date(), dt.time())
Copy link
Member

@Ten0 Ten0 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems arbitrary.
I think I would have expected the one that just drops timezone information (turns into naive) without converting to UTC, but arguably that's arbitrary as well.

In fact, it seems that there's really no non-arbitrary way to implicitly drop timezone information.

=> I would prefer that we do not do this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is arbitrary, but this is also just mirroring the current behavior of the existing MySql implementation.

If all your timestamps are sanitized internally to UTC it's reasonably common for applications to store without timezone to save space. Since local tz obviously doesn't work, the only reasonable assumption is UTC.

Copy link
Member

@weiznich weiznich Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference to mysql here is that postgresql has a distinct timestamp type (Timestamp with time zone), which does exactly what you describe. So it seems to be a bit odd to reimplement that behavior here.

Given that there is no non-arbitrary way to implicitly drop timezone information I personally would prefer to keep that step explicit for the user. I would go as far and argue that accepting the mysql impl was a mistake back then. Maybe we should put a #[deprecated] on that impl instead?

Copy link
Member

@Ten0 Ten0 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does trait implementation deprecation work now? 🤩
(Or do we just want to use the without-deprecated feature?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if rustc will show the deprecation warning, but we have the without-deprectaed feature flag for such cases.

@jeff-hiner
Copy link
Author

Thanks for opening this PR. This is really appreciated. Can you elaborate on how to solve potential issues around time-zones that might occur here, especially which time zone should be chosen for the offset and whether that choice is always sensible?

Commented inline above, but I'm more or less copying the existing code from the MySql impl that assumes UTC. It's a reasonable default, as long as it's consistent on both insertion and selection.

Existing code:
https://github.com/diesel-rs/diesel/blob/master/diesel/src/mysql/types/date_and_time/time.rs#L112-L113
https://github.com/diesel-rs/diesel/blob/master/diesel/src/mysql/types/date_and_time/time.rs#L120-L121

Ok(prim.assume_utc())
}
}

#[cfg(all(feature = "time", feature = "postgres_backend"))]
impl FromSql<Timestamptz, Pg> for PrimitiveDateTime {
Copy link
Member

@Ten0 Ten0 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: I see that we already do implement this, both here and in the chrono module, and I don't like it for the same reasons.
But here we probably don't want to remove it until the next major release, because that would otherwise be breaking.

@jeff-hiner
Copy link
Author

I think we should not impl ToSql/FromSql<Timestamp, _> for OffsetDateTime, see below for in-depth explanations.

I would however be ok with an impl ToSql/FromSql<Timestamptz, _> for OffsetDateTime

In that case, you may want to PR the removal of the existing MySql impl here:

https://github.com/diesel-rs/diesel/blob/master/diesel/src/mysql/types/date_and_time/time.rs#L108-L138

@jeff-hiner
Copy link
Author

Maybe a reasonable compromise here would be to gate the "assume UTC on Timestamp" impls behind a feature flag? It's unfortunately not possible for users to impl this on their own because of foreign trait rules, and it does make dealing with implicit-UTC Timestamp fields a lot more ergonomic.

@Ten0
Copy link
Member

Ten0 commented Mar 29, 2024

Maybe a reasonable compromise here would be to gate the "assume UTC on Timestamp" impls behind a feature flag?

That may be a reasonable compromise, depending on whether other solutions turn out to be acceptable without introducing additional maintenance complexity with feature flags.
The way in which features are enabled (for all crates in workspace if one has it) may hurt users who have only one project in their workspace that needs it, but the rest doesn't (they wouldn't see compilation errors for all the rest of their workspace), but that's enough of an edge-case at this point that I'd be ok with it. (At least that wouldn't negatively affect me 😅)

@Ten0
Copy link
Member

Ten0 commented Mar 29, 2024

Similarly I don't think these impls:

#[cfg(all(feature = "time", feature = "postgres_backend"))]
impl FromSql<Timestamptz, Pg> for PrimitiveDateTime {
fn from_sql(bytes: PgValue<'_>) -> deserialize::Result<Self> {
FromSql::<Timestamp, Pg>::from_sql(bytes)
}
}
#[cfg(all(feature = "time", feature = "postgres_backend"))]
impl ToSql<Timestamptz, Pg> for PrimitiveDateTime {
fn to_sql<'b>(&'b self, out: &mut Output<'b, '_, Pg>) -> serialize::Result {
ToSql::<Timestamp, Pg>::to_sql(self, out)
}
}

or these impls:
#[cfg(all(feature = "chrono", feature = "postgres_backend"))]
impl FromSql<Timestamptz, Pg> for NaiveDateTime {
fn from_sql(bytes: PgValue<'_>) -> deserialize::Result<Self> {
FromSql::<Timestamp, Pg>::from_sql(bytes)
}
}
#[cfg(all(feature = "chrono", feature = "postgres_backend"))]
impl ToSql<Timestamptz, Pg> for NaiveDateTime {
fn to_sql<'b>(&'b self, out: &mut Output<'b, '_, Pg>) -> serialize::Result {
ToSql::<Timestamp, Pg>::to_sql(self, out)
}
}

should exist (especially the ToSql one).

Mainly for me the risk is that people start mistakenly inserting non timezone-aware datetimes as timezone-aware data in a Timestamptz field of the database (as if it was utc, when it's in fact not explicitly expressed anywhere - they should have called .assume_utc() at some point and used timezone-aware types from then on).

@jeff-hiner
Copy link
Author

I'm going to withdraw this PR. I think the plan to deprecate the mismatched stuff makes a lot more sense. Let's see if there's a good way to document that it's possible on Pg to replace Timestamp with Timestamptz.

@jeff-hiner jeff-hiner closed this Apr 2, 2024
@jeff-hiner
Copy link
Author

@Ten0 to follow up, modifying schema.rs to s/Timestamp/Timestamptz/ worked great... right up until a CI build tried to set up an integration test db with diesel migration run --locked-schema and barfed. I'm going to try the other option now.

@Ten0
Copy link
Member

Ten0 commented Apr 3, 2024

You should use the schema patch feature where you can set a patch file for the schema in the diesel.toml, I would expect that locked-schema makes its comparison after. Or you can avoid locked-schema and run the diff check yourself in CI, after e.g. sed ing Timestamp to Timestamptz

@weiznich
Copy link
Member

weiznich commented Apr 3, 2024

A patch file is the correct solution. The lock check is performed after modifying the generated schema via the patch. See the relevant guide for details.

@jeff-hiner
Copy link
Author

I considered a patch file, but it doesn't scale when folks on the team add new tables with Timestamp fields-- they need to know how to regenerate the patch file. I'm guessing there's no way to run it like a sed and just substitute all?

@weiznich
Copy link
Member

weiznich commented Apr 4, 2024

There is currently no way to replace all occurrence of one type with another while generating your schema.rs file. I think it might be possible for you to just add that a script check (via grep or whatever) to your CI setup and just reject any schema.rs file that contains a Timestamp field??

As a more general note: From time to time I think about introducing a more flexible overwrite mechanism that can be configured via the diesel.toml file. Something along the lines that would allow you to specify a match pattern (table, column, type, etc) and a replacement. That might be useful for sqlite (with the weak type system on their end) and also for such use cases. That's currently mostly blocked on capacity. If you are interested in helping out with that we can certainly discuss possible designs, although I should probably add that this is likely not a feature that's implemented in a day or two.

@Ten0
Copy link
Member

Ten0 commented Apr 4, 2024

What we do is we only migration run without --locked-schema, then run our patches (we actually run cargo fmt on the generated file, and apply other patches manually - this code dates from before I knew there was a patch feature integrated to diesel, but I like that we run our custom formatting before applying patches, since that makes patches homogeneous with final file), then we run git diff --exit-code to make sure the file hasn't changed.

So you could do:

diesel migration run
sed -i 's/Timestamp /Timestamptz /' src/schema.rs
git diff --exit-code src/schema.rs

That doesn't require any additional Diesel feature, and does make the CI fail if the schema has changed (as a bonus, it prints out the diff between the expected and actual schema).

@jeff-hiner
Copy link
Author

Yeah, I think I'll set up a make target to just autogen/overwrite the patch file from sed output. Then it'll show up as a diff in git, and devs can commit it along with changes to the schema itself.

@weiznich I don't wanna make more work for something that's likely only a problem for me. The existing patch functionality should be enough.

@jeff-hiner jeff-hiner deleted the pg_offsetdatetime branch August 28, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants