You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running ClickBench query 2 against CSUP data produces an incorrect avg() aggregation result. The incorrect result comes in both the sequential runtime and vector runtime, so it's presumably a problem with the conversion to the CSUP format. I've confirmed that this problem showed up starting at commit 66b20d0, which is associated with the changes in #5577.
Details
Repro is with super commit 04b7efc using the hits.parquet test data from ClickBench.
As a baseline, here's the presumed correct avg() result of 1513.4879349030107 that matches when querying the original Parquet data from DuckDB and SuperDB using both the sequential and vector runtime.
Once we convert the Parquet file to CSUP, the avg result is now very different in both the sequential and vector runtimes.
$ super -f csup -o hits.csup hits.parquet
$ SUPER_VAM=1 super -c "SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits.csup;"
{sum:7280088,count:99997497(uint64),avg:7224.820559888614}
$ super -c "SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits.csup;"
{sum:7280088,count:99997497(uint64),avg:7224.820559888614}
Tests indicate this started happening at commit 66b20d0, which is associated with the changes in #5577. The result was correct at the commit just before:
$ super -version && super -f csup -o hits.csup hits.parquet && SUPER_VAM=1 super -c "SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits.csup;"
Version: v1.18.0-226-g02d23319
{sum:7280088,count:99997497(uint64),avg:1513.4879349030107}
philrz
changed the title
Incorrect avg aggregation result when querying CSUP data
Incorrect avg aggregation result when querying CSUP data (regression at 66b20d0)
Jan 22, 2025
tl;dr
Running ClickBench query 2 against CSUP data produces an incorrect
avg()
aggregation result. The incorrect result comes in both the sequential runtime and vector runtime, so it's presumably a problem with the conversion to the CSUP format. I've confirmed that this problem showed up starting at commit 66b20d0, which is associated with the changes in #5577.Details
Repro is with super commit 04b7efc using the hits.parquet test data from ClickBench.
As a baseline, here's the presumed correct
avg()
result of1513.4879349030107
that matches when querying the original Parquet data from DuckDB and SuperDB using both the sequential and vector runtime.Once we convert the Parquet file to CSUP, the
avg
result is now very different in both the sequential and vector runtimes.Tests indicate this started happening at commit 66b20d0, which is associated with the changes in #5577. The result was correct at the commit just before:
Then incorrect at 66b20d0.
The text was updated successfully, but these errors were encountered: