-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(c/driver_manager): protect against uninitialized AdbcError #570
Conversation
|
Thanks! I'll note, some drivers deliberately append to an existing error, which we can't do with this. (Also, this means that you need to check and release the error before making further calls: probably good practice, but will also trip people up) |
Finally, I think all current implementations assume caller will zero things, for all structures - not just the error. |
Hi - thanks for the feedback. The api defines this field as '[out]' and thus the caller should not expect that anything passed into the API will survive. e.g. here: AdbcConnectionRelease |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
I'm going to be out for the next couple weeks so I won't be able to review, but it's probably reasonable. (I'd still rather that people explicitly zero things like AdbcStatement
before use, though.)
I'll also note individual drivers are both linked to directly and used via the driver manager so there's not a clear internal/external API boundary there. (Oh, you already noted that.)
@@ -597,6 +628,7 @@ AdbcStatusCode AdbcStatementSetSqlQuery(struct AdbcStatement* statement, | |||
AdbcStatusCode AdbcStatementSetSubstraitPlan(struct AdbcStatement* statement, | |||
const uint8_t* plan, size_t length, | |||
struct AdbcError* error) { | |||
AdbcErrorInit(error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset or something might be clearer? (Or is there a need for a function when it's only one line of code?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is small - I can replace with a direct call to memset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After seeing @kou comments below I'll rename the fn to AdbcErrorReset and check for null as well.
Also, CC @zeroshade if you have any opinions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our documentations says that the caller must initialize AdbcError
:
Lines 169 to 171 in 639ca71
/// optional out parameter, which can be inspected. If provided, it is | |
/// the responsibility of the caller to zero-initialize the AdbcError | |
/// value. |
Should we update this too?
The suggested behavior may be convenience but may cause a memory leak unexpectedly:
struct AdbcError error = {};
AdbcDatabaseNew(..., &error); // error
// error->release(error); // forget to call release()
AdbcDatabaseNew(..., &error); // reuse same error.
// current behavior: message is appended
// suggested behavior: memory leak
@@ -57,6 +57,11 @@ void GetWinError(std::string* buffer) { | |||
|
|||
#endif // defined(_WIN32) | |||
|
|||
// Struct initializers | |||
static void AdbcErrorInit(struct AdbcError* error) { | |||
std::memset(error, 0, sizeof(*error)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a NULL
check because error
is an optional out parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right. So maybe a reason to keep the AdbcErrorInit function, and add a null check. I missed that comment in adbc.h - if that is the defined behaviour then the docs should probably replicate it to the individual API calls (DatabaseNew/ConnectionNew/StatementNew already call out that the Database/Connection/Statement parameters must be zero initialized).
As a defensive mechanism on the API whoever I do think it should be explicit in zeroing it, as in the current implementation if it is not zeroed then you get erratic behaviour (which is how I came across this - I was getting a segmentation violation because "message" had junk in it going in because I didn't realize I needed to initialize to zero, and that caused SetError to incorrectly call error->release() which also contained junk and caused the segmentation violation).
the need for the "error->release()" call by the API caller is something I had planned to bring up separately - seems like a bad idea to pass allocation back in an optional error field that is then responsibility of the caller to free (and I don't even see that mentioned in the documentation btw).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struggle is that we don't want to assume libc malloc/free (possibly we could/should have). You can see this pattern in the Arrow C Data Interface as well where all structs have an explicit release callback.
b84c307
to
b47d382
Compare
If it's documented in the header (thanks Kou for checking!) then we should stick to the existing behavior. |
this patch doesn't require a change to the existing behaviour. If a caller does pass in a zero AdbcError then this patch doesn't harm that (just an added memset overhead which is a small price to pay for the protection to the API imo). |
Ah, you're right. Then checking for null should be enough. |
okay - I've resubmitted the PR with the null check + rename AdbcErrorInit -> AdbcErrorReset (and added a testcase for nullptr being passed into AdbcError). I did mess up the amend and ended up with an extra merge commit (haven't done a PR in a little while). Let me know if it looks okay or I should resubmit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Kou or @zeroshade, any other opinions here?
Ah, though there's still leaks in the tests. |
Probably because of the exact issue that Kou pointed out; it makes it easier to leak memory if you don't check the error (though of course, you should be checking the error). |
So the real question is which footgun do we prefer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the real question is which footgun do we prefer
I prefer the current specification (the caller is responsible to initialize AdbcError
) because it's not strange in general C API and the suggested API may cause memory leaks implicitly like the current tests.
(It's not a strong opinion.)
// simulate API calls using uninitialized AdbcError structs | ||
std::memset(&invalid_err, 0xff, sizeof(invalid_err)); | ||
ASSERT_THAT(AdbcDatabaseInit(&database, &invalid_err), | ||
IsStatus(ADBC_STATUS_INVALID_STATE, &invalid_err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to call invalid_err.release(&invalid_err)
after this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are a few missing releases (in my patch and other tests). Updating the patchset to add these.
ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err)); | ||
ASSERT_THAT( | ||
AdbcDatabaseSetOption(&database, "driver", "adbc_driver_sqlite", &invalid_err), | ||
IsOkStatus(&invalid_err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
std::memset(&invalid_err, 0xff, sizeof(invalid_err)); | ||
ASSERT_THAT( | ||
AdbcDatabaseSetOption(&database, "notavalidkey", "notavalidvalue", &invalid_err), | ||
IsStatus(ADBC_STATUS_NOT_IMPLEMENTED, &invalid_err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
ADBC C API functions should initialize AdbcError struct passed into them instead of assuming that the caller did so. Given that these are "output-only" type parameters there is usually no expectation that they need to be zeroed coming into API calls.
Adding missing releases of error messages in driver_manager_test
Adding missing releases of error messages in adbc_validation
638cb92
to
3bb9597
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we don't need this but I defer to @lidavidm.
std::memset(&invalid_err, 0xff, sizeof(invalid_err)); | ||
ASSERT_THAT(AdbcDatabaseInit(&database, &invalid_err), | ||
IsStatus(ADBC_STATUS_INVALID_STATE, &invalid_err)); | ||
if (invalid_err.release) invalid_err.release(&invalid_err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this if
? I think that this must have ADBC_STATUS_INVALID_STATE
error.
if (invalid_err.release) invalid_err.release(&invalid_err); | |
invalid_err.release(&invalid_err); |
@@ -83,9 +83,58 @@ TEST_F(DriverManager, DatabaseCustomInitFunc) { | |||
AdbcDatabaseSetOption(&database, "entrypoint", "ThisSymbolDoesNotExist", &error), | |||
IsOkStatus(&error)); | |||
ASSERT_EQ(ADBC_STATUS_INTERNAL, AdbcDatabaseInit(&database, &error)); | |||
if (error.release) error.release(&error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this if
?
if (error.release) error.release(&error); | |
error.release(&error); |
ASSERT_THAT(AdbcDatabaseRelease(&database, &error), IsOkStatus(&error)); | ||
} | ||
|
||
TEST_F(DriverManager, UninitializedError) { | ||
struct AdbcDatabase database; | ||
struct AdbcError invalid_err; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this instead of existing error
?
if (invalid_err.release) invalid_err.release(&invalid_err); | ||
|
||
std::memset(&invalid_err, 0xff, sizeof(invalid_err)); | ||
ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err)); | |
ASSERT_THAT(AdbcDatabaseNew(&database, &invalid_err), IsOkStatus(&invalid_err)); |
ASSERT_THAT( | ||
AdbcDatabaseSetOption(&database, "notavalidkey", "notavalidvalue", &invalid_err), | ||
IsStatus(ADBC_STATUS_NOT_IMPLEMENTED, &invalid_err)); | ||
if (invalid_err.release) invalid_err.release(&invalid_err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (invalid_err.release) invalid_err.release(&invalid_err); | |
invalid_err.release(&invalid_err); |
EXPECT_EQ(AdbcDatabaseSetOption(&database, "notavalidkey", "notavalidvalue", nullptr), | ||
ADBC_STATUS_NOT_IMPLEMENTED); | ||
ASSERT_THAT(AdbcDatabaseRelease(&database, &error), IsOkStatus(&error)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -100,9 +149,12 @@ TEST_F(DriverManager, ConnectionOptions) { | |||
ASSERT_THAT(AdbcConnectionNew(&connection, &error), IsOkStatus(&error)); | |||
ASSERT_THAT(AdbcConnectionSetOption(&connection, "foo", "bar", &error), | |||
IsOkStatus(&error)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASSERT_EQ(ADBC_STATUS_NOT_IMPLEMENTED, | ||
AdbcConnectionInit(&connection, &database, &error)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASSERT_THAT(error.message, ::testing::HasSubstr("Unknown connection option foo=bar")); | ||
if (error.release) error.release(&error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (error.release) error.release(&error); | |
error.release(&error); |
and others...
I think for self-consistency we should stick with requiring that the user zero out all structs before use. This is maybe less convenient but makes intent clearer. |
In light of bugs like #729, effectively the implementation requires that all structs must be zero-initialized (Golang-based drivers require that all inputs must be initialized). #946/#954 will only exacerbate this, so I think we will have to keep things as-is. Thank you for raising this behavior with us! |
ADBC C API functions should initialize AdbcError struct passed into them instead of assuming that the caller did so. Given that these are "output-only" type parameters there is usually no expectation that they need to be zeroed coming into API calls.
This PR updates driver-manager. If all looks good and it gets merged, I'll follow up with equivalent PRs for the drivers.