fix(c/driver_manager): protect against uninitialized AdbcError #570

cdaudt · 2023-03-31T17:03:16Z

ADBC C API functions should initialize AdbcError struct passed into them instead of assuming that the caller did so. Given that these are "output-only" type parameters there is usually no expectation that they need to be zeroed coming into API calls.
This PR updates driver-manager. If all looks good and it gets merged, I'll follow up with equivalent PRs for the drivers.

github-actions · 2023-03-31T17:03:29Z

⚠️ Please follow the Conventional Commits format in CONTRIBUTING.md for PR titles.

lidavidm · 2023-03-31T20:20:27Z

Thanks!

I'll note, some drivers deliberately append to an existing error, which we can't do with this. (Also, this means that you need to check and release the error before making further calls: probably good practice, but will also trip people up)

lidavidm · 2023-03-31T20:20:54Z

Finally, I think all current implementations assume caller will zero things, for all structures - not just the error.

cdaudt · 2023-03-31T20:46:09Z

Hi - thanks for the feedback. The api defines this field as '[out]' and thus the caller should not expect that anything passed into the API will survive. e.g. here: AdbcConnectionRelease
But I do agree that while the driver manager is okay in that calls such as AdbcConnectionRelease are only called from the API clients, the same is not true in the actual drivers, where the same function is used both to provide the public API as well as the callback to the driver manager today. Will need to address that in equivalent changes to the drivers.

lidavidm

Fair enough.

I'm going to be out for the next couple weeks so I won't be able to review, but it's probably reasonable. (I'd still rather that people explicitly zero things like AdbcStatement before use, though.)

I'll also note individual drivers are both linked to directly and used via the driver manager so there's not a clear internal/external API boundary there. (Oh, you already noted that.)

lidavidm · 2023-03-31T21:05:45Z

c/driver_manager/adbc_driver_manager.cc

@@ -597,6 +628,7 @@ AdbcStatusCode AdbcStatementSetSqlQuery(struct AdbcStatement* statement,
 AdbcStatusCode AdbcStatementSetSubstraitPlan(struct AdbcStatement* statement,
                                             const uint8_t* plan, size_t length,
                                             struct AdbcError* error) {
+  AdbcErrorInit(error);


Reset or something might be clearer? (Or is there a need for a function when it's only one line of code?)

It is small - I can replace with a direct call to memset.

After seeing @kou comments below I'll rename the fn to AdbcErrorReset and check for null as well.

lidavidm · 2023-03-31T21:08:17Z

Also, CC @zeroshade if you have any opinions?

kou

Our documentations says that the caller must initialize AdbcError:

arrow-adbc/adbc.h

Lines 169 to 171 in 639ca71

    
           /// optional out parameter, which can be inspected. If provided, it is 
        
           /// the responsibility of the caller to zero-initialize the AdbcError 
        
           /// value.

Should we update this too?

The suggested behavior may be convenience but may cause a memory leak unexpectedly:

struct AdbcError error = {};
AdbcDatabaseNew(..., &error); // error
// error->release(error); // forget to call release()
AdbcDatabaseNew(..., &error); // reuse same error.
                              // current behavior: message is appended
                              // suggested behavior: memory leak

kou · 2023-03-31T21:32:03Z

c/driver_manager/adbc_driver_manager.cc

@@ -57,6 +57,11 @@ void GetWinError(std::string* buffer) {

 #endif  // defined(_WIN32)

+// Struct initializers
+static void AdbcErrorInit(struct AdbcError* error) {
+  std::memset(error, 0, sizeof(*error));


We should add a NULL check because error is an optional out parameter.

yes, you're right. So maybe a reason to keep the AdbcErrorInit function, and add a null check. I missed that comment in adbc.h - if that is the defined behaviour then the docs should probably replicate it to the individual API calls (DatabaseNew/ConnectionNew/StatementNew already call out that the Database/Connection/Statement parameters must be zero initialized).
As a defensive mechanism on the API whoever I do think it should be explicit in zeroing it, as in the current implementation if it is not zeroed then you get erratic behaviour (which is how I came across this - I was getting a segmentation violation because "message" had junk in it going in because I didn't realize I needed to initialize to zero, and that caused SetError to incorrectly call error->release() which also contained junk and caused the segmentation violation).
the need for the "error->release()" call by the API caller is something I had planned to bring up separately - seems like a bad idea to pass allocation back in an optional error field that is then responsibility of the caller to free (and I don't even see that mentioned in the documentation btw).

The struggle is that we don't want to assume libc malloc/free (possibly we could/should have). You can see this pattern in the Arrow C Data Interface as well where all structs have an explicit release callback.

lidavidm · 2023-03-31T23:06:51Z

If it's documented in the header (thanks Kou for checking!) then we should stick to the existing behavior.

cdaudt · 2023-03-31T23:19:55Z

If it's documented in the header (thanks Kou for checking!) then we should stick to the existing behavior.

this patch doesn't require a change to the existing behaviour. If a caller does pass in a zero AdbcError then this patch doesn't harm that (just an added memset overhead which is a small price to pay for the protection to the API imo).
Do you mean you prefer I drop the patch?
The separate issue of driver-manager -> driver calls bypassing the zero'ed out AdbcError expectation on purpose seems like a bad way to get driver-manager & driver to collaborate on filling up an error structure. I can send a separate patch to try to address that.

lidavidm · 2023-03-31T23:36:54Z

Ah, you're right. Then checking for null should be enough.

cdaudt · 2023-03-31T23:43:04Z

Ah, you're right. Then checking for null should be enough.

okay - I've resubmitted the PR with the null check + rename AdbcErrorInit -> AdbcErrorReset (and added a testcase for nullptr being passed into AdbcError). I did mess up the amend and ended up with an extra merge commit (haven't done a PR in a little while). Let me know if it looks okay or I should resubmit.

lidavidm

LGTM. Thanks!

Kou or @zeroshade, any other opinions here?

lidavidm · 2023-04-19T00:17:17Z

Ah, though there's still leaks in the tests.

lidavidm · 2023-04-19T00:17:51Z

Probably because of the exact issue that Kou pointed out; it makes it easier to leak memory if you don't check the error (though of course, you should be checking the error).

lidavidm · 2023-04-19T00:18:20Z

So the real question is which footgun do we prefer

kou

So the real question is which footgun do we prefer

I prefer the current specification (the caller is responsible to initialize AdbcError) because it's not strange in general C API and the suggested API may cause memory leaks implicitly like the current tests.

(It's not a strong opinion.)

kou · 2023-04-19T00:33:59Z

c/driver_manager/adbc_driver_manager_test.cc

+  // simulate API calls using uninitialized AdbcError structs
+  std::memset(&invalid_err, 0xff, sizeof(invalid_err));
+  ASSERT_THAT(AdbcDatabaseInit(&database, &invalid_err),
+              IsStatus(ADBC_STATUS_INVALID_STATE, &invalid_err));


We need to call invalid_err.release(&invalid_err) after this.

Yes, there are a few missing releases (in my patch and other tests). Updating the patchset to add these.

kou · 2023-04-19T00:34:10Z

c/driver_manager/adbc_driver_manager_test.cc

+  ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err));
+  ASSERT_THAT(
+      AdbcDatabaseSetOption(&database, "driver", "adbc_driver_sqlite", &invalid_err),
+      IsOkStatus(&invalid_err));


kou · 2023-04-19T00:34:43Z

c/driver_manager/adbc_driver_manager_test.cc

+  std::memset(&invalid_err, 0xff, sizeof(invalid_err));
+  ASSERT_THAT(
+      AdbcDatabaseSetOption(&database, "notavalidkey", "notavalidvalue", &invalid_err),
+      IsStatus(ADBC_STATUS_NOT_IMPLEMENTED, &invalid_err));


ADBC C API functions should initialize AdbcError struct passed into them instead of assuming that the caller did so. Given that these are "output-only" type parameters there is usually no expectation that they need to be zeroed coming into API calls.

Adding missing releases of error messages in driver_manager_test

Adding missing releases of error messages in adbc_validation

cdaudt · 2023-05-25T05:57:08Z

hi @kou @lidavidm just a ping on this - are you okay to proceed with this PR? If so I'll rebase to latest and resubmit.

kou

I think that we don't need this but I defer to @lidavidm.

kou · 2023-05-26T05:42:43Z

c/driver_manager/adbc_driver_manager_test.cc

+  std::memset(&invalid_err, 0xff, sizeof(invalid_err));
+  ASSERT_THAT(AdbcDatabaseInit(&database, &invalid_err),
+              IsStatus(ADBC_STATUS_INVALID_STATE, &invalid_err));
+  if (invalid_err.release) invalid_err.release(&invalid_err);


Do we need this if? I think that this must have ADBC_STATUS_INVALID_STATE error.

Suggested change

if (invalid_err.release) invalid_err.release(&invalid_err);

invalid_err.release(&invalid_err);

kou · 2023-05-26T05:43:48Z

c/driver_manager/adbc_driver_manager_test.cc

@@ -83,9 +83,58 @@ TEST_F(DriverManager, DatabaseCustomInitFunc) {
      AdbcDatabaseSetOption(&database, "entrypoint", "ThisSymbolDoesNotExist", &error),
      IsOkStatus(&error));
  ASSERT_EQ(ADBC_STATUS_INTERNAL, AdbcDatabaseInit(&database, &error));
+  if (error.release) error.release(&error);


Why do we need this if?

Suggested change

if (error.release) error.release(&error);

error.release(&error);

kou · 2023-05-26T05:44:20Z

c/driver_manager/adbc_driver_manager_test.cc

  ASSERT_THAT(AdbcDatabaseRelease(&database, &error), IsOkStatus(&error));
 }

+TEST_F(DriverManager, UninitializedError) {
+  struct AdbcDatabase database;
+  struct AdbcError invalid_err;


Why do we need this instead of existing error?

kou · 2023-05-26T05:44:42Z

c/driver_manager/adbc_driver_manager_test.cc

+  if (invalid_err.release) invalid_err.release(&invalid_err);
+
+  std::memset(&invalid_err, 0xff, sizeof(invalid_err));
+  ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err));


Is this correct?

Suggested change

ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err));

ASSERT_THAT(AdbcDatabaseNew(&database, &invalid_err), IsOkStatus(&invalid_err));

kou · 2023-05-26T05:45:16Z

c/driver_manager/adbc_driver_manager_test.cc

+  ASSERT_THAT(
+      AdbcDatabaseSetOption(&database, "notavalidkey", "notavalidvalue", &invalid_err),
+      IsStatus(ADBC_STATUS_NOT_IMPLEMENTED, &invalid_err));
+  if (invalid_err.release) invalid_err.release(&invalid_err);


Suggested change

if (invalid_err.release) invalid_err.release(&invalid_err);

invalid_err.release(&invalid_err);

kou · 2023-05-26T05:45:48Z

c/driver_manager/adbc_driver_manager_test.cc

+  EXPECT_EQ(AdbcDatabaseSetOption(&database, "notavalidkey", "notavalidvalue", nullptr),
+            ADBC_STATUS_NOT_IMPLEMENTED);
+  ASSERT_THAT(AdbcDatabaseRelease(&database, &error), IsOkStatus(&error));
+


Suggested change

kou · 2023-05-26T05:46:10Z

c/driver_manager/adbc_driver_manager_test.cc

@@ -100,9 +149,12 @@ TEST_F(DriverManager, ConnectionOptions) {
  ASSERT_THAT(AdbcConnectionNew(&connection, &error), IsOkStatus(&error));
  ASSERT_THAT(AdbcConnectionSetOption(&connection, "foo", "bar", &error),
              IsOkStatus(&error));
+


Suggested change

kou · 2023-05-26T05:46:14Z

c/driver_manager/adbc_driver_manager_test.cc

  ASSERT_EQ(ADBC_STATUS_NOT_IMPLEMENTED,
            AdbcConnectionInit(&connection, &database, &error));
+


Suggested change

kou · 2023-05-26T05:46:26Z

c/driver_manager/adbc_driver_manager_test.cc

  ASSERT_THAT(error.message, ::testing::HasSubstr("Unknown connection option foo=bar"));
+  if (error.release) error.release(&error);


Suggested change

if (error.release) error.release(&error);

error.release(&error);

and others...

lidavidm · 2023-05-26T12:11:19Z

I think for self-consistency we should stick with requiring that the user zero out all structs before use. This is maybe less convenient but makes intent clearer.

lidavidm · 2023-08-01T18:25:59Z

In light of bugs like #729, effectively the implementation requires that all structs must be zero-initialized (Golang-based drivers require that all inputs must be initialized). #946/#954 will only exacerbate this, so I think we will have to keep things as-is. Thank you for raising this behavior with us!

lidavidm reviewed Mar 31, 2023

View reviewed changes

lidavidm changed the title ~~fix (c/driver-manager): Protect against uninitialized AdbcError~~ fix(c/driver_manager)!: protect against uninitialized AdbcError Mar 31, 2023

lidavidm changed the title ~~fix(c/driver_manager)!: protect against uninitialized AdbcError~~ fix(c/driver_manager): protect against uninitialized AdbcError Mar 31, 2023

kou reviewed Mar 31, 2023

View reviewed changes

cdaudt force-pushed the fix/adbcerror_uninitialized branch from b84c307 to b47d382 Compare March 31, 2023 23:05

lidavidm self-requested a review April 19, 2023 00:15

lidavidm approved these changes Apr 19, 2023

View reviewed changes

kou reviewed Apr 19, 2023

View reviewed changes

cdaudt and others added 3 commits April 18, 2023 22:57

fix (c/driver_manager_test): missing error releases

1437b23

Adding missing releases of error messages in driver_manager_test

fix (c/adbc_validation): missing error releases

3bb9597

Adding missing releases of error messages in adbc_validation

cdaudt force-pushed the fix/adbcerror_uninitialized branch from 638cb92 to 3bb9597 Compare April 19, 2023 06:02

kou reviewed May 26, 2023

View reviewed changes

lidavidm closed this Aug 1, 2023

	/// optional out parameter, which can be inspected. If provided, it is
	/// the responsibility of the caller to zero-initialize the AdbcError
	/// value.

	if (invalid_err.release) invalid_err.release(&invalid_err);
	invalid_err.release(&invalid_err);

	if (error.release) error.release(&error);
	error.release(&error);

	ASSERT_THAT(AdbcDatabaseNew(&database, &error), IsOkStatus(&invalid_err));
	ASSERT_THAT(AdbcDatabaseNew(&database, &invalid_err), IsOkStatus(&invalid_err));

		ASSERT_EQ(ADBC_STATUS_NOT_IMPLEMENTED,
		AdbcConnectionInit(&connection, &database, &error));

		ASSERT_THAT(error.message, ::testing::HasSubstr("Unknown connection option foo=bar"));
		if (error.release) error.release(&error);

fix(c/driver_manager): protect against uninitialized AdbcError #570

fix(c/driver_manager): protect against uninitialized AdbcError #570

Conversation

cdaudt commented Mar 31, 2023

github-actions bot commented Mar 31, 2023

lidavidm commented Mar 31, 2023

lidavidm commented Mar 31, 2023

cdaudt commented Mar 31, 2023 • edited Loading

lidavidm left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidavidm commented Mar 31, 2023

kou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidavidm commented Mar 31, 2023

cdaudt commented Mar 31, 2023

lidavidm commented Mar 31, 2023

cdaudt commented Mar 31, 2023

lidavidm left a comment

Choose a reason for hiding this comment

lidavidm commented Apr 19, 2023

lidavidm commented Apr 19, 2023

lidavidm commented Apr 19, 2023

kou left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdaudt commented May 25, 2023 • edited Loading

kou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidavidm commented May 26, 2023

lidavidm commented Aug 1, 2023

cdaudt commented Mar 31, 2023 •

edited

Loading

lidavidm left a comment •

edited

Loading

kou left a comment •

edited

Loading

cdaudt commented May 25, 2023 •

edited

Loading