Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(c/driver_manager): protect against uninitialized AdbcError #570

Closed
wants to merge 3 commits into from

Conversation

cdaudt
Copy link

@cdaudt cdaudt commented Mar 31, 2023

ADBC C API functions should initialize AdbcError struct passed into them instead of assuming that the caller did so. Given that these are "output-only" type parameters there is usually no expectation that they need to be zeroed coming into API calls.
This PR updates driver-manager. If all looks good and it gets merged, I'll follow up with equivalent PRs for the drivers.

@github-actions
Copy link

⚠️ Please follow the Conventional Commits format in CONTRIBUTING.md for PR titles.

@lidavidm
Copy link
Member

Thanks!

I'll note, some drivers deliberately append to an existing error, which we can't do with this. (Also, this means that you need to check and release the error before making further calls: probably good practice, but will also trip people up)

@lidavidm
Copy link
Member

Finally, I think all current implementations assume caller will zero things, for all structures - not just the error.

@cdaudt
Copy link
Author

cdaudt commented Mar 31, 2023

Hi - thanks for the feedback. The api defines this field as '[out]' and thus the caller should not expect that anything passed into the API will survive. e.g. here: AdbcConnectionRelease
But I do agree that while the driver manager is okay in that calls such as AdbcConnectionRelease are only called from the API clients, the same is not true in the actual drivers, where the same function is used both to provide the public API as well as the callback to the driver manager today. Will need to address that in equivalent changes to the drivers.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

I'm going to be out for the next couple weeks so I won't be able to review, but it's probably reasonable. (I'd still rather that people explicitly zero things like AdbcStatement before use, though.)

I'll also note individual drivers are both linked to directly and used via the driver manager so there's not a clear internal/external API boundary there. (Oh, you already noted that.)

@@ -597,6 +628,7 @@ AdbcStatusCode AdbcStatementSetSqlQuery(struct AdbcStatement* statement,
AdbcStatusCode AdbcStatementSetSubstraitPlan(struct AdbcStatement* statement,
const uint8_t* plan, size_t length,
struct AdbcError* error) {
AdbcErrorInit(error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset or something might be clearer? (Or is there a need for a function when it's only one line of code?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is small - I can replace with a direct call to memset.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After seeing @kou comments below I'll rename the fn to AdbcErrorReset and check for null as well.

@lidavidm lidavidm changed the title fix (c/driver-manager): Protect against uninitialized AdbcError fix(c/driver_manager)!: protect against uninitialized AdbcError Mar 31, 2023
@lidavidm lidavidm changed the title fix(c/driver_manager)!: protect against uninitialized AdbcError fix(c/driver_manager): protect against uninitialized AdbcError Mar 31, 2023
@lidavidm
Copy link
Member

Also, CC @zeroshade if you have any opinions?

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our documentations says that the caller must initialize AdbcError:

arrow-adbc/adbc.h

Lines 169 to 171 in 639ca71

/// optional out parameter, which can be inspected. If provided, it is
/// the responsibility of the caller to zero-initialize the AdbcError
/// value.

Should we update this too?

The suggested behavior may be convenience but may cause a memory leak unexpectedly:

struct AdbcError error = {};
AdbcDatabaseNew(..., &error); // error
// error->release(error); // forget to call release()
AdbcDatabaseNew(..., &error); // reuse same error.
                              // current behavior: message is appended
                              // suggested behavior: memory leak 

@@ -57,6 +57,11 @@ void GetWinError(std::string* buffer) {

#endif // defined(_WIN32)

// Struct initializers
static void AdbcErrorInit(struct AdbcError* error) {
std::memset(error, 0, sizeof(*error));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a NULL check because error is an optional out parameter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right. So maybe a reason to keep the AdbcErrorInit function, and add a null check. I missed that comment in adbc.h - if that is the defined behaviour then the docs should probably replicate it to the individual API calls (DatabaseNew/ConnectionNew/StatementNew already call out that the Database/Connection/Statement parameters must be zero initialized).
As a defensive mechanism on the API whoever I do think it should be explicit in zeroing it, as in the current implementation if it is not zeroed then you get erratic behaviour (which is how I came across this - I was getting a segmentation violation because "message" had junk in it going in because I didn't realize I needed to initialize to zero, and that caused SetError to incorrectly call error->release() which also contained junk and caused the segmentation violation).
the need for the "error->release()" call by the API caller is something I had planned to bring up separately - seems like a bad idea to pass allocation back in an optional error field that is then responsibility of the caller to free (and I don't even see that mentioned in the documentation btw).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struggle is that we don't want to assume libc malloc/free (possibly we could/should have). You can see this pattern in the Arrow C Data Interface as well where all structs have an explicit release callback.

@cdaudt cdaudt force-pushed the fix/adbcerror_uninitialized branch from b84c307 to b47d382 Compare March 31, 2023 23:05
@lidavidm
Copy link
Member

If it's documented in the header (thanks Kou for checking!) then we should stick to the existing behavior.

@cdaudt
Copy link
Author

cdaudt commented Mar 31, 2023

If it's documented in the header (thanks Kou for checking!) then we should stick to the existing behavior.

this patch doesn't require a change to the existing behaviour. If a caller does pass in a zero AdbcError then this patch doesn't harm that (just an added memset overhead which is a small price to pay for the protection to the API imo).
Do you mean you prefer I drop the patch?
The separate issue of driver-manager -> driver calls bypassing the zero'ed out AdbcError expectation on purpose seems like a bad way to get driver-manager & driver to collaborate on filling up an error structure. I can send a separate patch to try to address that.

@lidavidm
Copy link
Member

Ah, you're right. Then checking for null should be enough.

@cdaudt
Copy link
Author

cdaudt commented Mar 31, 2023

Ah, you're right. Then checking for null should be enough.

okay - I've resubmitted the PR with the null check + rename AdbcErrorInit -> AdbcErrorReset (and added a testcase for nullptr being passed into AdbcError). I did mess up the amend and ended up with an extra merge commit (haven't done a PR in a little while). Let me know if it looks okay or I should resubmit.

@lidavidm lidavidm self-requested a review April 19, 2023 00:15
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Kou or @zeroshade, any other opinions here?

@lidavidm
Copy link
Member

Ah, though there's still leaks in the tests.

@lidavidm
Copy link
Member

Probably because of the exact issue that Kou pointed out; it makes it easier to leak memory if you don't check the error (though of course, you should be checking the error).

@lidavidm
Copy link
Member

So the real question is which footgun do we prefer

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the real question is which footgun do we prefer

I prefer the current specification (the caller is responsible to initialize AdbcError) because it's not strange in general C API and the suggested API may cause memory leaks implicitly like the current tests.

(It's not a strong opinion.)

// simulate API calls using uninitialized AdbcError structs
std::memset(&invalid_err, 0xff, sizeof(invalid_err));
ASSERT_THAT(AdbcDatabaseInit(&database, &invalid_err),
IsStatus(ADBC_STATUS_INVALID_STATE, &invalid_err));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to call invalid_err.release(&invalid_err) after this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are a few missing releases (in my patch and other tests). Updating the patchset to add these.

ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err));
ASSERT_THAT(
AdbcDatabaseSetOption(&database, "driver", "adbc_driver_sqlite", &invalid_err),
IsOkStatus(&invalid_err));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

std::memset(&invalid_err, 0xff, sizeof(invalid_err));
ASSERT_THAT(
AdbcDatabaseSetOption(&database, "notavalidkey", "notavalidvalue", &invalid_err),
IsStatus(ADBC_STATUS_NOT_IMPLEMENTED, &invalid_err));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

cdaudt and others added 3 commits April 18, 2023 22:57
ADBC C API functions should initialize AdbcError struct passed into
them instead of assuming that the caller did so. Given that these
are "output-only" type parameters there is usually no expectation
that they need to be zeroed coming into API calls.
Adding missing releases of error messages in driver_manager_test
Adding missing releases of error messages in adbc_validation
@cdaudt cdaudt force-pushed the fix/adbcerror_uninitialized branch from 638cb92 to 3bb9597 Compare April 19, 2023 06:02
@cdaudt
Copy link
Author

cdaudt commented May 25, 2023

hi @kou @lidavidm just a ping on this - are you okay to proceed with this PR? If so I'll rebase to latest and resubmit.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we don't need this but I defer to @lidavidm.

std::memset(&invalid_err, 0xff, sizeof(invalid_err));
ASSERT_THAT(AdbcDatabaseInit(&database, &invalid_err),
IsStatus(ADBC_STATUS_INVALID_STATE, &invalid_err));
if (invalid_err.release) invalid_err.release(&invalid_err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this if? I think that this must have ADBC_STATUS_INVALID_STATE error.

Suggested change
if (invalid_err.release) invalid_err.release(&invalid_err);
invalid_err.release(&invalid_err);

@@ -83,9 +83,58 @@ TEST_F(DriverManager, DatabaseCustomInitFunc) {
AdbcDatabaseSetOption(&database, "entrypoint", "ThisSymbolDoesNotExist", &error),
IsOkStatus(&error));
ASSERT_EQ(ADBC_STATUS_INTERNAL, AdbcDatabaseInit(&database, &error));
if (error.release) error.release(&error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this if?

Suggested change
if (error.release) error.release(&error);
error.release(&error);

ASSERT_THAT(AdbcDatabaseRelease(&database, &error), IsOkStatus(&error));
}

TEST_F(DriverManager, UninitializedError) {
struct AdbcDatabase database;
struct AdbcError invalid_err;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this instead of existing error?

if (invalid_err.release) invalid_err.release(&invalid_err);

std::memset(&invalid_err, 0xff, sizeof(invalid_err));
ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Suggested change
ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err));
ASSERT_THAT(AdbcDatabaseNew(&database, &invalid_err), IsOkStatus(&invalid_err));

ASSERT_THAT(
AdbcDatabaseSetOption(&database, "notavalidkey", "notavalidvalue", &invalid_err),
IsStatus(ADBC_STATUS_NOT_IMPLEMENTED, &invalid_err));
if (invalid_err.release) invalid_err.release(&invalid_err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (invalid_err.release) invalid_err.release(&invalid_err);
invalid_err.release(&invalid_err);

EXPECT_EQ(AdbcDatabaseSetOption(&database, "notavalidkey", "notavalidvalue", nullptr),
ADBC_STATUS_NOT_IMPLEMENTED);
ASSERT_THAT(AdbcDatabaseRelease(&database, &error), IsOkStatus(&error));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -100,9 +149,12 @@ TEST_F(DriverManager, ConnectionOptions) {
ASSERT_THAT(AdbcConnectionNew(&connection, &error), IsOkStatus(&error));
ASSERT_THAT(AdbcConnectionSetOption(&connection, "foo", "bar", &error),
IsOkStatus(&error));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

ASSERT_EQ(ADBC_STATUS_NOT_IMPLEMENTED,
AdbcConnectionInit(&connection, &database, &error));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

ASSERT_THAT(error.message, ::testing::HasSubstr("Unknown connection option foo=bar"));
if (error.release) error.release(&error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (error.release) error.release(&error);
error.release(&error);

and others...

@lidavidm
Copy link
Member

I think for self-consistency we should stick with requiring that the user zero out all structs before use. This is maybe less convenient but makes intent clearer.

@lidavidm
Copy link
Member

lidavidm commented Aug 1, 2023

In light of bugs like #729, effectively the implementation requires that all structs must be zero-initialized (Golang-based drivers require that all inputs must be initialized). #946/#954 will only exacerbate this, so I think we will have to keep things as-is. Thank you for raising this behavior with us!

@lidavidm lidavidm closed this Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants