Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix negative fractional of seconds #32

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Mapiarz
Copy link

@Mapiarz Mapiarz commented Jul 1, 2024

This created a non conforming interval such as P0000Y00M-12DT-13H-02M-39.-215749S. This would not pass validation and would lead to a crash.

This created a non conforming interval such as 'P0000Y00M-12DT-13H-02M-39.-215749S'.
This would not pass validation and would lead to a crash.
@sjamaan
Copy link
Contributor

sjamaan commented Jul 5, 2024

I'm not sure if this will be merged anymore by anyone, but the commit as it stands is not really useful - it's not even clear if it fixes anything, because the input looks like it should be invalid, at least at first sight..

A test case would be essential, and maybe you can explain (or show in the test case) how such timestamps can even occur, because I have a hard time figuring out how that would happen in normal operation.

@Mapiarz
Copy link
Author

Mapiarz commented Jul 5, 2024

@sjamaan The contrib docs seem outdated and I cannot get the tests to run, so I'm not going to write one, sorry.

But this is a really simple problem. If you have an interval with negative microseconds, the code on line 101 (fmt = 'to_char(%s, \'PYYYY"Y"MM"M"DD"DT"HH24"H"MI"M"SS.US"S"\')' % sql) will create an incorrect ISO interval like I've given in my initial comment: P0000Y00M-12DT-13H-02M-39.-215749S. This is not a valid ISO interval indeed and it will crash further down the line. My addition is very simple and ensures that there is no - sign in the fractional part of seconds in the converted string.

So in short, I can have an interval with negative microseconds, save it to the database successfully but then during retrieval and deserialization from the DB it will crash due to the cast to string hack that didn't realize that microseconds can be negative.

If this project is not actively maintained then whoever owns it should consider handing it over to something like https://jazzband.co/.

@sjamaan
Copy link
Contributor

sjamaan commented Jul 5, 2024

So in short, I can have an interval with negative microseconds

It looks like your entire interval is negative. But it's not just microseconds that are disallowed by the ISO syntax. AFAICT the BNF simply doesn't allow for negative intervals at all. Adjusting it like in your commit looks like it would produce an incorrect result which means the output doesn't match the input. I can't imagine a situation where you'd want that. In such a case, it would be better to strip the microseconds from your input.

Ideally, in this library, either the deserialization should be changed so that it can handle negative intervals properly (which is probably possible, perhaps by encoding negative/positive in a separate value or by using a different parser), or at worst the value should be rejected at input.

Anyway, like I said I'm not using this library anymore myself, so I'll leave it to someone from Code Yellow to decide what to do. Your suggestion to "donate" it to Jazz Band sounds good. @abzainuddin could you look into this?

@Mapiarz
Copy link
Author

Mapiarz commented Jul 5, 2024

It looks like your entire interval is negative.

Yes, that's correct, but I'm just giving an example man. You can repro it with something like relativedelta(seconds=-1, microseconds=-5). The problem are the microseconds, nothing else.

But it's not just microseconds that are disallowed by the ISO syntax. AFAICT the BNF simply doesn't allow for negative intervals at all.

Yes, but this library explicitly does not really conform to the standard, neither does postgres as a matter of fact. See comments atop utils.py. Postgres does allow for negative intervals. But it doesn't make sense to put the - sign before the fractional part of a second. The assumption here is that the sign for microseconds will be the same as for seconds. Which with relativedelta might not be the case. But we don't care, it's an edge case.

So what I'm proposing with this PR is a localized fix to a localized hack in the first place, that actually makes this work as it was originally intended.

If you want to propose a bigger rewrite then that's another story.

@sjamaan
Copy link
Contributor

sjamaan commented Jul 5, 2024

It looks like your entire interval is negative.

Yes, that's correct, but I'm just giving an example man. You can repro it with something like relativedelta(seconds=-1, microseconds=-5). The problem are the microseconds, nothing else.

Sure. But something like relativedelta(seconds=1,microseconds=-5) would get normalized to P0.999995S (as per format_relativedelta). That's why I was confused at first.

Intuitively, normalization makes more sense than mixing and matching positive and negative numbers, although that is allowed, even in Postgres durations (I guess you couldn't do that for days and months, and there may be semantic issues with doing it for the smaller values too - I haven't really thought that through).

But it's not just microseconds that are disallowed by the ISO syntax. AFAICT the BNF simply doesn't allow for negative intervals at all.

Yes, but this library explicitly does not really conform to the standard, neither does postgres as a matter of fact. See comments atop utils.py.

I had forgotten all about that, sorry for the unnecessary derailment of the conversation!

Postgres does allow for negative intervals. But it doesn't make sense to put the - sign before the fractional part of a second. The assumption here is that the sign for microseconds will be the same as for seconds. Which with relativedelta might not be the case. But we don't care, it's an edge case.

IMO, that's the wrong approach - the idea is that this library should faithfully store what you pass in (modulo any normalizations), and return it as well (most of the test suite is about ensuring that values survive DB roundtrips). It might be an edge case for you in your use case, but in general, hacking around it would be a bit broken. Now, I fully admit using the "ISO representation with extensions" is also a bit of a hack to make parsing easy, but I always approached the design of the library to be fully correct when viewed "from the outside".

So what I'm proposing with this PR is a localized fix to a localized hack in the first place, that actually makes this work as it was originally intended.

It looks like the parser simply interprets the seconds as a floating point number, so your patch might even be correct as it stands. I'm not sure what would happen if the seconds are zero or positive, though. I suspect it would get parsed as a positive number then, which would be incorrect.

Like I said, it really needs some tests.

@Mapiarz
Copy link
Author

Mapiarz commented Jul 5, 2024

But something like relativedelta(seconds=1,microseconds=-5) would get normalized to P0.999995S (as per format_relativedelta). That's why I was confused at first.

Sure, but relativedelta(seconds=-1,microseconds=-5) (notice the minus 1 second) already doesn't make the round trip to DB and back, exactly because of the issue I'm trying to resolve here. The hack in select_format will convert it to a string such as PT-1.-000005S that nobody considers valid. Not the ISO standard, not SQL, not postgres, not even this library that defines the regex for what is accepted and what isn't and according to it such a string is not accepted.

This is clearly caused by somebody overlooking that when implementing the logic in select_format.

Futhermore, you can have intervals generated on the postgres side (calculated columns) and the format_relativedelta method will never run anyway.

It might be an edge case for you in your use case, but in general, hacking around it would be a bit broken.

Actually, it's not even an edge case, it's impossible according to postgres specification: https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT

It only stores months, days and microseconds, which means that postgres will never give me an interval where seconds are positive and microseconds negative. Which makes the fix 100% safe.

I feel like I have now explained the problem in enough detail. As I said, I can't run tests and unfortunately I don't have time to debug why. If you want to expand the PR with some tests coverage, please do so, contributions welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants