Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP stats over counting total_bytes_received? #65

Open
mmaitre314 opened this issue Jul 15, 2024 · 2 comments
Open

HTTP stats over counting total_bytes_received? #65

mmaitre314 opened this issue Jul 15, 2024 · 2 comments

Comments

@mmaitre314
Copy link
Contributor

mmaitre314 commented Jul 15, 2024

I am trying to optimize a query and noticed that the HTTP stats in EXPLAIN ANALYZE statements seem to be off. I query one Parquet file with 10.79 GiB and the HTTP stats mention reading 32.7 GiB. I am wondering whether http_state_policy.cpp could be over-counting total_bytes_received, and in particular including values from the content-length HTTP header of HEAD requests.

SET azure_transport_option_type = curl;
SET azure_http_stats = True;
SET threads = 1;
SET azure_read_transfer_concurrency = 1;
SET azure_read_transfer_chunk_size = 1024 * 1024;
SET azure_read_buffer_size = 1024 * 1024;

EXPLAIN ANALYZE SELECT col1 FROM 'az://<snip>.blob.core.windows.net/<snip>.parquet' LIMIT 1 
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││            HTTP Stats:            ││
││                                   ││
││            in: 32.7 GiB           ││
││            out: 0 bytes           ││
││              #HEAD: 3             ││
││             #GET: 354             ││
││              #PUT: 0              ││
││              #POST: 0             ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘

In the Azure SDK logs I see 3 HEAD requests with content-length : 11583653237 and 349 GET requests with content-length : 1048576. So the total input data should be around 0.34 GiB instead of 32.7 GiB.

If this analysis is correct, I can send a small PR to fix.

@mmaitre314
Copy link
Contributor Author

@quentingodeau - does this analysis sound correct? Happy to send a fix if it does.

@quentingodeau
Copy link
Contributor

Hello, sry for the late reply, I will double check and do the fix. Thx a lot for the analyses!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants