Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All-fields as an argument of aggregator such as count() can be resolved after other field #814

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Oct 25, 2024

Description

Count(), as is count(*) in SQL, cannot be resolved correctly when it located after other fields resolution.
For example:

source = $testTable | eval a = 1 | stats sum(a) as sum, avg(a) as avg, count() as cnt by country

Throws

[NESTED_AGGREGATE_FUNCTION] It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;
Project [sum#31L, cnt#33L, country#34]
+- Aggregate [country#38], [sum(a#30) AS sum#31L, count(avg(a#30)) AS cnt#33L, country#38 AS country#34]
+- Project [name#35, age#36, state#37, country#38, year#39, month#40, 1 AS a#30]
+- SubqueryAlias spark_catalog.default.flint_ppl_test
+- Relation spark_catalog.default.flint_ppl_test[name#35,age#36,state#37,country#38,year#39,month#40] csv

But

source = $testTable | eval a = 1 | stats count() as cnt, sum(a) as sum, avg(a) as avg by country

works as expected.

Related Issues

Resolves #811

Check List

  • Updated documentation (docs/ppl-lang/README.md)
  • Implemented unit tests
  • Implemented tests for combination with other commands
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -662,11 +662,7 @@ public Expression visitCorrelationMapping(FieldsMapping node, CatalystPlanContex

@Override
public Expression visitAllFields(AllFields node, CatalystPlanContext context) {
// Case of aggregation step - no start projection can be added
if (context.getNamedParseExpressions().isEmpty()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YANG-DB could you double confirm this fixing? IMO it's safe to remove the condition. Any case do I miss? Since I don't understand the comment of L665. Another similar case is in method visitEval(), the same condition could be removed since each eval command must convert to a project list start with *.

@LantaoJin LantaoJin changed the title All-fields as an arg of aggregator count() can be resolved after other fields All-fields as an argument of aggregator such as count() can be resolved after other fields Oct 25, 2024
@LantaoJin LantaoJin changed the title All-fields as an argument of aggregator such as count() can be resolved after other fields All-fields as an argument of aggregator such as count() can be resolved after other field Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Count() cannot be the last aggregator in Stats command
1 participant