Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Bulk Update and Delete Operations #802

Open
danny-adu opened this issue Jul 9, 2024 · 1 comment
Open

Optimize Bulk Update and Delete Operations #802

danny-adu opened this issue Jul 9, 2024 · 1 comment
Assignees

Comments

@danny-adu
Copy link

Feature Suggestion: Optimize Bulk Update and Delete Operations

Background

Currently, the BulkUpdateAsync and BulkDeleteAsync methods provided by Z.EntityFramework.Extensions.EFCore are very useful for batch operations. However, there are scenarios where these operations can be further optimized, particularly in terms of selecting columns in subqueries and ensuring efficient locking mechanisms.

Suggestion

I would like to suggest an optimization for the BulkUpdateAsync and BulkDeleteAsync methods. Specifically, when performing bulk updates or deletes based on complex conditions, it would be beneficial to:

  1. Select only necessary columns in subqueries:
    When forming subqueries to identify rows for updating or deleting, only the primary key or unique identifier columns should be selected. This ensures minimal data retrieval and reduces overhead.

  2. Efficient Locking Mechanisms:
    Ensure that the operations are optimized to use row or page locks where possible, instead of escalating to table locks, to improve concurrency and performance.

Example

Here is an example of the current SQL generated and the proposed optimized SQL:

Current SQL:

UPDATE A 
SET A.[Visits] = B.[Visits] + 1
FROM [Blogs] AS A
INNER JOIN 
(
    SELECT [b].[BlogId], [b].[Url], [b].[Visits]
    FROM [Blogs] AS [b]
    WHERE [b].[SomeOtherCondition] = 'Value' AND [b].[AnotherCondition] < 0
) AS B 
ON A.[BlogId] = B.[BlogId]

Proposed Optimized SQL:

UPDATE A 
SET A.[Visits] = B.[Visits] + 1
FROM [Blogs] AS A
INNER JOIN 
(
    SELECT [b].[BlogId]
    FROM [Blogs] AS [b]
    WHERE [b].[SomeOtherCondition] = 'Value' AND [b].[AnotherCondition] < 0
) AS B 
ON A.[BlogId] = B.[BlogId]


Benefits
Performance: Reducing the number of columns selected in subqueries decreases the amount of data processed and transferred, improving query performance.
Concurrency: Using more granular locking mechanisms (row or page locks) enhances concurrency and reduces contention, especially in high-traffic applications.
Conclusion
Implementing these optimizations will make the bulk operations more efficient and better suited for high-performance scenarios. Thank you for considering this suggestion.
@JonathanMagnan JonathanMagnan self-assigned this Jul 9, 2024
@JonathanMagnan
Copy link
Member

Hello @danny-adu

One problem when selecting only the "right" columns is that the sub-query is generated by Entity Framework itself, and we don't really know what the developer will want to do, which might lead to issues.

Even your optimized query currently doesn't work. You perform a SET A.[Visits] = B.[Visits] + 1 but the B table doesn't longer contain the [Visits] column as you only selected the key SELECT [b].[BlogId].

So, currently, that's one of the main reasons everything is selected, even if not in the most optimized way, as we don't always know exactly what the developer will do.

Let me know if that answers your question correctly.

Best Regards,

Jon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants