Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphQL] A concurrent update was performed on this collection and corrupted its state. #15731

Open
lampersky opened this issue Apr 11, 2024 · 19 comments

Comments

@lampersky
Copy link
Contributor

When using GraphiQL editor, sometimes I'm getting this exception, when querying content items with nested lists.
It's not consistently repeatable for me (approx once every five queries).

image

Stacktrace:

GraphQL.Execution.UnhandledError: Error trying to resolve field 'contentItems'.
 ---> System.InvalidOperationException: Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
   at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
   at YesSql.Session.Get[T](IList`1 documents, String collection)
   at YesSql.Services.DefaultQuery.Query`1.ListImpl()
   at YesSql.Services.DefaultQuery.Query`1.ListImpl()
   at OrchardCore.ContentManagement.GraphQL.DataLoaderExtensions.LoadPublishedContentItemsAsync(IEnumerable`1 contentItemIds, ISession session)
   at GraphQL.DataLoader.CollectionBatchDataLoader`2.FetchAsync(IEnumerable`1 list, CancellationToken cancellationToken)
   at GraphQL.DataLoader.DataLoaderExtensions.<>c__DisplayClass5_0`2.<<Then>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.DataLoader.DataLoaderPair`2.GetResultAsync(CancellationToken cancellationToken) in /_/src/GraphQL.DataLoader/DataLoaderPair.cs:line 81
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.DataLoader.DataLoaderPair`2.GetResultAsync(CancellationToken cancellationToken) in /_/src/GraphQL.DataLoader/DataLoaderPair.cs:line 81
   at GraphQL.DataLoader.DataLoaderPair`2.GetResultAsync(CancellationToken cancellationToken) in /_/src/GraphQL.DataLoader/DataLoaderPair.cs:line 81
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.DataLoader.DataLoaderExtensions.<>c__DisplayClass5_0`2.<<Then>b__0>d.MoveNext()
   at GraphQL.DataLoader.DataLoaderExtensions.<>c__DisplayClass5_0`2.<<Then>b__0>d.MoveNext()
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.DataLoader.DataLoaderExtensions.<>c__DisplayClass5_0`2.<<Then>b__0>d.MoveNext()
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.DataLoader.DataLoaderExtensions.<>c__DisplayClass5_0`2.<<Then>b__0>d.MoveNext()
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.DataLoader.DataLoaderExtensions.<>c__DisplayClass5_0`2.<<Then>b__0>d.MoveNext()
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.DataLoader.DataLoaderExtensions.<>c__DisplayClass5_0`2.<<Then>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   at GraphQL.DataLoader.SimpleDataLoader`1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync(CancellationToken cancellationToken)
   at GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 515
   --- End of inner exception stack trace ---

@sebastienros I've just found that you fixed similar issue 3 years ago.

@aaronamm
Copy link
Contributor

Following.

@sebastienros
Copy link
Member

Probably not the same fix but same root cause. If there is a query that returns multiple objects (in our case content items), or on object has multiple children to load (content items again) then if GraphQL.NET uses async operations that are not awaited, or if it uses threads (Parallel.ForEach for instance) at that point it's possible that two concurrent accesses to the same session (the one from the scoped DI) will conflict.

The solution is to find from the query how GraphQL.NET is handling these. With the previous version we had changed the default behavior which was to do thinks asynchronously to synchronously, I even filed a bug in their repos to tell them it's shouldn't be a default, too risky. Now it could just be our code that is not awaiting a task (like the fix you linked).

I will try to follow the breadcrumbs from the trace you shared:
GraphQL.Execution.ExecutionStrategy.CompleteDataLoaderNodeAsync
GraphQL.DataLoader.SimpleDataLoader1.GraphQL.DataLoader.IDataLoaderResult.GetResultAsync
OrchardCore.ContentManagement.GraphQL.DataLoaderExtensions.LoadPublishedContentItemsAsync
GraphQL.DataLoader.CollectionBatchDataLoader.FetchAsync

@sebastienros
Copy link
Member

@lampersky Is your query using content items containing or contained in a ContentPickerField?

@lampersky
Copy link
Contributor Author

@sebastienros yes

@lampersky
Copy link
Contributor Author

I wasn't able to repro on sqlite. But will try tomorrow. The reported problem is on SQL Server.

Query looks sth like this, it is not the original one.
So, there are nested lists, there are content item pickers and also bag parts.

query MyQuery {
  event(where: { contentItemId : "40ke33ne4k931544be1hrk57cb" }) {
    displayText
    contentItemId
    organizers {
      contentItems(first: 30) {
        ... on Speaker {
                displayText
                modifiedUtc
                contentType
                createdUtc
                latest
                owner
                published
                publishedUtc
        }
      }
    }
    list {
      contentItems(first: 300) {
        ... on Track {
          displayText
          contentItemId
          moderators {
            contentItems(first: 30) {
              ... on Speaker {
                displayText
                modifiedUtc
                contentType
                createdUtc
                latest
                owner
                published
                publishedUtc
              }
            }
          }
          list {
            contentItems(first: 300) {
              ... on Session {
                displayText
                modifiedUtc
                contentType
                createdUtc
                latest
                owner
                published
                publishedUtc
                test {
                  contentItems(first: 30) {
                    ... on Speaker {
                      displayText
                      contentItemId
                    }
                  }
                }
                speakers {
                  contentItems(first: 20) {
                    ... on Speaker {
                      displayText
                      modifiedUtc
                      contentType
                      createdUtc
                      latest
                      owner
                      published
                      publishedUtc
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

@lampersky
Copy link
Contributor Author

@sebastienros this is the smallest possible query, all unnecessary things were removed (ContentPickers, BagParts etc.), so we only have nested list parts

image

GraphQL.Execution.UnhandledError: Error trying to resolve field 'contentItems'.
 ---> System.InvalidOperationException: Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.
   at System.Collections.Generic.Dictionary`2.FindValue(TKey key)
   at YesSql.Session.Get[T](IList`1 documents, String collection)
   at YesSql.Services.DefaultQuery.Query`1.ListImpl()
   at YesSql.Services.DefaultQuery.Query`1.ListImpl()
   at OrchardCore.Lists.GraphQL.ListQueryObjectType.LoadPublishedContentItemsForListAsync(IEnumerable`1 contentItemIds, ISession session)
   at GraphQL.DataLoader.CollectionBatchDataLoader`2.FetchAsync(IEnumerable`1 list, CancellationToken cancellationToken)
   at GraphQL.DataLoader.DataLoaderPair`2.GetResultAsync(CancellationToken cancellationToken) in /_/src/GraphQL.DataLoader/DataLoaderPair.cs:line 81
   at OrchardCore.Lists.GraphQL.ListQueryObjectType.<>c.<<-ctor>b__0_2>d.MoveNext()
--- End of stack trace from previous location ---
   at GraphQL.Resolvers.FuncFieldResolver`2.<>c__DisplayClass3_0.<<-ctor>b__1>d.MoveNext()
--- End of stack trace from previous location ---
   at GraphQL.Execution.ExecutionStrategy.ExecuteNodeAsync(ExecutionContext context, ExecutionNode node) in /_/src/GraphQL/Execution/ExecutionStrategy.cs:line 472
   --- End of inner exception stack trace ---

@lampersky
Copy link
Contributor Author

I've simplified resolve part to the minimum, to exclude loader problems etc, and I'm still facing the issue. It looks like ResolveLockedAsync is not enough to lock querying YesSql.

image

when I changed to ResolveAsync and added semaphore, I can't repro the issue anymore.

image

Other than that, using loader for list part is questionable and could be improved. It looks like all items are fetched and skip/first is applied on materialized collection.

Anyway, it would be good to know your thoughts @sebastienros.

@sebastienros
Copy link
Member

I understand how it's fixing all the issues, but that's also wrong, because it's a static semaphore, so this is actually blocking every single call, even from different db sessions. The lock should be per session, or request.

Looking at the current ResolveLockedAsync it seems better, as it's on the context, and the code is identical I believe. Maybe the graphQlContext is actually cloned, that could be checked with some logging.

@sebastienros sebastienros added this to the 1.x milestone Apr 18, 2024
@hyzx86
Copy link
Contributor

hyzx86 commented Apr 24, 2024

I wasn't able to repro on sqlite. But will try tomorrow. The reported problem is on SQL Server.

@lampersky ,I seem to be running into this problem, but I can't be sure because when I run into concurrency issues, I do this by adding MultipleActiveResultSets=True; To my connection string, and it was fixed

I was surprised to see your description, I was beginning to think it was just on Sqlite

@guanwu
Copy link

guanwu commented May 10, 2024

I noticed that the ResolveLockedAsync method is not effective here. If you set a breakpoint in the LoadPublishedContentItemsForListAsync method, you can see that they are executed in parallel. I solved this problem by setting a synchronization lock in this method, but I am not sure if this is the best solution.

image

YesSQL (5.0.0) has added an EnableThreadSafetyCheck setting. After setting it to true, it is clear that the LoadPublishedContentitemsForListAsync method is not thread safe.

@lampersky
Copy link
Contributor Author

@lampersky ,I seem to be running into this problem, but I can't be sure because when I run into concurrency issues, I do this by adding MultipleActiveResultSets=True; To my connection string, and it was fixed

@hyzx86 sorry for late reply, I've tried to reproduce this issue on sqlite using a smaller sample of data, and that is why I wrote that I'm not able to reproduce on sqlite. I thought I will be able prepare some smaller recipe, but unfortunately it is not as easy, my recipe file has almost 20MB. So, it looks like the problem can be only observed when you have lots of items and you are querying all of them.

@hyzx86
Copy link
Contributor

hyzx86 commented May 14, 2024

I probably understand your situation, since your paging size is 300+, it's possible that the next query was requested while the current request was still pending, hence the concurrency issue.

@tropcicstefan
Copy link
Contributor

I think it's finally time to introduce SerialDocumentExecuter. As was discussed few years ago, new versions of graphql introduced serial execution for dataloader meaning we can discard LockedAsyncFieldResolver and solve all problems by executing in serial, as it was in beginning

@sebastienros
Copy link
Member

@tropcicstefan right, there is this: https://github.com/graphql-dotnet/graphql-dotnet/blob/master/src/GraphQL/Execution/DefaultExecutionStrategySelector.cs#L43

That the DocumentExecuter uses, and the default is Parallel for queries. So there must be a way top configure it to use Serial for queries, or create a custom ExecutionStrategySelector ans use it.

@tropcicstefan
Copy link
Contributor

a custom ExecutionStrategySelector ans use it.

here

(and there is always scoped scope as an option in resolver)

@sebastienros
Copy link
Member

I have the feeling that I already did this some years ago, changing the default execution policy to use serial for queries... So maybe this is not the problem, the Data loader may have a different one.

@tropcicstefan
Copy link
Contributor

I actually think with the changes in gql dotnet we can remove the locks ..
and use the serial data execution strat we used too but removed because it
didn’t support dataloaders

carlwoodhouse's comment

maybe won't solve the issue but code would be cleaner, without locked customizations

@sebastienros
Copy link
Member

Maybe fixed by #17332

@gvkries
Copy link
Contributor

gvkries commented Jan 16, 2025

@lampersky I'm not able to reproduce this error, can you have a look if #17332 fixes the issue? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants