Musing On Timezones #529
Replies: 2 comments 3 replies
-
I love the concept of civil time, postgres's time+timezone is a weird thing. Most databases have the abstraction civil time and syncronized time. Syncronized time can appear a timestamp field, seconds from epoch, MS from epoch. When syncronized timestamp is converted to a date or rendered, an additional piece of information is necessary, the timezone. By default, this is the timezone the user is sitting. |
Beta Was this translation helpful? Give feedback.
-
Malloy currently does all of its time comprehension in UTC. If we introduce a "query time zone" it is pretty straightforward that all the truncation and extraction results should reflect this time zone. But what does a truncated time look like. If I take Because all timestamps in bigquery are in UTC, the value of a timestamp which has had some sort of comprehension gesture is NOT the value which the user would want to see. Looker has it's own time type. This type is a a string, and is assumed to be in the query operation time zone. In Looker, most time filtering is done in the looker-query-time type. Time values are converted into this time type before doing any sort of comprehension of a time. This means that all truncated times are zero-filled past the truncation. I don't know if this becomes the malloy time type, or if there is just a filter on the output of malloy which applies the timezone translation. |
Beta Was this translation helpful? Give feedback.
-
Thinking About Timezone
Time types have grown over the years and are kind of a mess, and are different in every SQL implementation. Malloy should have a clean and sensible syntax for computing with time values but which does not force the user to learn the warts of the time universe of any particular SQL dialect.
Civil Time
There are basically two kinds of time. There is an unambiguous time value, which refers to time in a way which identifies a particular moment in time, "1970-01-01 00:00:00 UTC". And there is ambiguous time, like "2001-01-01". In an unambiguous time, all readers of that data will know exactly which period of time that refers to. In an ambiguous time, there is another piece of data needed to understand the time, which is the time zone used when that data was recorded.
The Standard SQL documentation refers to these two entities as time and "civil time". I am still not sure what wording works when talking about them.
Civil Timezone Problem
This distinction between civil time and time not not only for things which obviously have no time zone, like a calendar date or a time stamp with no time zone, it also pops up when time zones are specified.
The BigQuery way of dealing with time is that all time calculations happen in "time" and a time literal might be written in a "civil" time but it is converted to UTC, which is how all time is stored and used in BQ, which is shown by this query:
In Postgres, a timestamp can have a time zone embedded in it, allowing you to do math on the components of a "civil time"
Similarly, BigQuery and Postgres will disagree on the question, "is it 8 o'clock right now", because the timestamp returned by a Postgres database for
now()
is a time stamp with a time zone embedded in it, where the BigQueryCURRENT_TIMESTAMP()
is a UTC time, as are all times in BigQuery.Working time zones into the @ syntax
The current syntax for time literals in Malloy has a slot for time zones, but it is unused, and wrong ...
@2001-02-03 04:05:06-XXXXX
--- where XXX was a slot for some time zone data which I hadn't really thought through.Malloy should just parse ISO 8001 for times, but ISO 8001 has no place for time zone names. 8001 is just a time with plus or minus followed by an offset.
BigQuery lets you place the time zone name after a timestamp using a space, because time literals are in a string.
Postgres syntax a time zone name is never part of a time, it is always a string.
We could do a number of things ...
@2001-02-03 04:05:06 America/Los_Angeles
@2001-02-03 04:05:06 PDT
@2001-02-03 04:05:06+America/Los_Angeles
@2001-02-03 04:05:06-PDT
@2001-02-03 04:05:06/America/Los_Angeles
@2001-02-03 04:05:06/PDT
@2001-02-03 04:05:06@America/Los_Angeles
@2001-02-03 04:05:06@PDT
Using space seems to most Malloy-like choice, but I am not sure how that parses. If we consume any sequence of letters after a timestamp, is prevents us from ever correctly parsing something like
There is no way to not grab the "and" as the "timezone" unless the lexical analyzer contained a complete list of timezone names, because the parsing of timestamp literals happens in the lexical phase. That is kind of a non starter.
After space, I think I prefer
/
Definitions
I would like to introduce some terminology so that precise sentences can be written about time-related data.
Time data in BigQuery and PostgresSQL
Not sure what time abstraction Malloy should provide. Let's start with that the existing Malloy dialects do.
While both databases have ways to specify civil and offset times, once the data is in a form where it can be stored and computed with, it has been converted to a native time of the database.
Both databases can produce a proper civil time when asked, or perform civil truncation or extraction on a native time.
In BigQuery native time
In Postgres native time
Both databases can answer the question, for any native time, "what would an observer in time zone XXX see on a local time clock at this moment in native time"
The trickier question is hinging on the fact that databases don't store timezones in their native time, they store offsets, is "What would that same observer see, 24 hours later". If I add 24 hours to a time, and I cross a DST boundary, then it matters if time zone name of the observer was DST observing time zone, except times are stored as offsets and so, in MANY cases the timezone of the observer is lost ...
EXCEPT in PostgresSQL, there are dynamic times which are not bound to a timezone as stored in the database. The timezone for a time is an indirection which is obtained from the column, the table, or the session.
Does that mean if I add 24 hours to a timestamp column, and my local timezone is set to a string which respects DST, i might see a 23 or 25 hour difference, but if my local timezone name is something which does not respect DST, I will always see a 24 hour difference? Is this true also for time zones set on sessions ... when does the mapping from name to offset happen?
I think it does. If so the question is now, does this feature have language implications for expressions written with time.
Lexically Bound Time Zones
We have imagined in a time zone aware world, that it might be useful to be able to have lexically bounded time zone domains in malloy code. For example, maybe a source has a time zone, and so any constants in that source are assumed to be in the source's time zone and any extraction in a source will be relative to the time zone of the source
There also might a lexical time zone for a model, a query, or even a field,
Do dates have time zones
Each of these should compare, the departureTime to a 24 hour period, but how is the 24 hour period computed?
Beta Was this translation helpful? Give feedback.
All reactions