Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(c/driver/postgresql): Duration support #907

Merged
merged 16 commits into from
Sep 7, 2023

Conversation

WillAyd
Copy link
Contributor

@WillAyd WillAyd commented Jul 15, 2023

No description provided.

@WillAyd WillAyd requested a review from lidavidm as a code owner July 15, 2023 05:23
@@ -274,8 +275,8 @@ class StatementTest {
template <typename CType>
void TestSqlIngestNumericType(ArrowType type);

template <enum ArrowTimeUnit TU>
void TestSqlIngestTemporalType(const char* timezone);
template <enum ArrowType T>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I tried to provide ArrowType and ArrowTimeUnit as template parameters, but that kept yielding errors like

/home/willayd/clones/arrow-adbc/c/validation/adbc_validation.cc:1173:114: error: macro "ASSERT_NO_FATAL_FAILURE" passed 2 arguments, but takes just 1
 1173 |   ASSERT_NO_FATAL_FAILURE(TestSqlIngestTemporalType<NANOARROW_TYPE_DURATION, NANOARROW_TIME_UNIT_SECOND>(nullptr));
      |                                                                                                                  ^
In file included from /home/willayd/clones/arrow-adbc/c/validation/adbc_validation.h:26,
                 from /home/willayd/clones/arrow-adbc/c/validation/adbc_validation.cc:18:
/usr/local/include/gtest/gtest.h:2216: note: macro "ASSERT_NO_FATAL_FAILURE" defined here
 2216 | #define ASSERT_NO_FATAL_FAILURE(statement) \

I'm not sure if that is a bug with gtest or my lack of C++ template knowledge, but figured changing the template around like this wasn't a huge deal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh for future reference, I think you just need MACRO((Template<A, B>), ...)

int64_t value = ReadUnsafe<int64_t>(data);
NANOARROW_RETURN_NOT_OK(ArrowBufferAppend(data_, &value, sizeof(int64_t)));

// discard unnecessary bits
Copy link
Contributor Author

@WillAyd WillAyd Jul 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought probably want to raise if these are non-zero

@WillAyd
Copy link
Contributor Author

WillAyd commented Jul 15, 2023

Not entirely sure if this makes sense to map; the Interval Arrow type seems to align more closely to postgres' interval type (though still not exactly the same). The only reason I started with duration was because it already has a schema setter in nanoarrow (via ArrowSchemaSetTypeDateTime) and the pandas test suite currently maps its timedelta to duration[ns] in the sql tests. @jorisvandenbossche if you have any thoughts would love to hear

@lidavidm
Copy link
Member

From a quick scan, yeah, it seems PostgreSQL intervals map most closely to Arrow's MonthDayNano interval

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jul 18, 2023

Indeed, and FWIW, pandas's to_sql maps timedelta data to integers and warns about doing that: https://github.com/pandas-dev/pandas/blob/c92b3701efff16293bd110a3f06f458040b43d97/pandas/io/sql.py#L1312-L1319

I am wondering if in theory we could store duration as an interval with only microseconds set (no months/days)

@jorisvandenbossche
Copy link
Member

I am wondering if in theory we could store duration as an interval with only microseconds set (no months/days)

Ah, that's actually what you are doing here ;)

That doesn't give a faithful roundtrip, but might be the most useful type on the postgres side?

@WillAyd
Copy link
Contributor Author

WillAyd commented Jul 18, 2023

The MonthDayNano interval like in #908 should be able to round trip. Not sure of the implications for casting from timedelta to that in Python but would be a more faithful of what Postgres expects

@WillAyd
Copy link
Contributor Author

WillAyd commented Jul 28, 2023

closing in favor of #908

@WillAyd
Copy link
Contributor Author

WillAyd commented Sep 6, 2023

After chatting some more with @jorisvandenbossche I think there is value in this, even though it can't be roundtripped

@lidavidm
Copy link
Member

lidavidm commented Sep 6, 2023

Alright, I'll give it a review for 0.7.0 (which I do want to push out in the next two weeks, hopefully)

@lidavidm lidavidm added this to the ADBC Libraries 0.7.0 milestone Sep 6, 2023
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one general comment, thanks!

break;
}
default:
ASSERT_TRUE(false) << "ValidateIngestedTemporalData not implemented for type "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FAIL() is meant for this

@lidavidm lidavidm merged commit 4d88c03 into apache:main Sep 7, 2023
65 checks passed
@WillAyd WillAyd deleted the duration-impl branch September 7, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants