Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csharp: integration tests (ClientTests/DriverTests) can cause concurrency issues creating/updating table. #2280

Open
birschick-bq opened this issue Oct 28, 2024 · 0 comments
Labels
Type: bug Something isn't working

Comments

@birschick-bq
Copy link
Contributor

What happened?

When running the (integration) tests, the ClientTests.CanExecuteUpdate and DriverTests.CanExecuteUpdate can concurrently try to create and update the same test table. This can lead to flakey test failures and in the worst case leave the Databricks server in an inconsistent state (resource leaks).

Stack Trace

Message: 
  System.AggregateException : One or more errors occurred. (Error running query: io.delta.exceptions.ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.
  Conflicting commit: {"timestamp":1730135866536,"operation":"WRITE","operationParameters":{"mode":Append,"statsOnLoad":false,"partitionBy":[]},"readVersion":34,"isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputRows":"1","numOutputBytes":"6118"},"tags":{"restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/13.3.x-scala2.12","txnId":"70495e06-e88b-446b-8dd3-9e216cdbf0c6"}
  Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.)
  ---- Apache.Arrow.Adbc.Drivers.Apache.Hive2.HiveServer2Exception : Error running query: io.delta.exceptions.ConcurrentAppendException: Files were added to the root of the table by a concurrent update. Please try the operation again.
  Conflicting commit: {"timestamp":1730135866536,"operation":"WRITE","operationParameters":{"mode":Append,"statsOnLoad":false,"partitionBy":[]},"readVersion":34,"isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputRows":"1","numOutputBytes":"6118"},"tags":{"restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/13.3.x-scala2.12","txnId":"70495e06-e88b-446b-8dd3-9e216cdbf0c6"}
  Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.

Stack Trace: 
  Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
  Task`1.get_Result()
  HiveServer2Statement.ExecuteUpdate() line 39
  AdbcCommand.ExecuteNonQuery() line 149
  ClientTests.CanClientExecuteUpdate(AdbcConnection adbcConnection, TestConfiguration testConfiguration, String[] queries, List`1 expectedResults) line 64
  ClientTests.CanClientExecuteUpdate() line 73
  RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
  MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
  ----- Inner Stack Trace -----
  HiveServer2Statement.ExecuteStatementAsync() line 119
  HiveServer2Statement.ExecuteQueryAsync() line 43
  HiveServer2Statement.ExecuteUpdateAsync() line 61

How can we reproduce the bug?

Using VS, invoke the tests for Apache.Arrow.Adbc.Tests.Drivers.Apache.Spark level.

Environment/Setup

No response

@birschick-bq birschick-bq added the Type: bug Something isn't working label Oct 28, 2024
CurtHagenlocher pushed a commit that referenced this issue Oct 29, 2024
…2282)

Provides an interim work-around for the concurrency issue identified in
#2280.

* Removes the SQL `DELETE` statements from the SQL table scripts.
* Uses the XUnit.Collection to serialize the execution of ClientTests
and DriverTests.
* Fixes the missing application of `HttpRequestTimeout` due to an
incomplete implementation of the `ValidateOptions` in
`SparkDatabricksConnection`.
* Improve table create table syntax to `CREATE OR REPLACE TABLE` to
reduce probably of inconsistent state.

Note: this is not the final solution. A more robust isolation of table
creation needs to done to isolate concurrency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant