Skip to content

Commit

Permalink
CNDB-12075: Allow UDFs within GROUP BY clause (#1494)
Browse files Browse the repository at this point in the history
  • Loading branch information
adelapena authored and djatnieks committed Jan 17, 2025
1 parent ca1ec17 commit 9421d23
Show file tree
Hide file tree
Showing 32 changed files with 622 additions and 118 deletions.
9 changes: 9 additions & 0 deletions doc/cql3/CQL.textile
Original file line number Diff line number Diff line change
Expand Up @@ -721,6 +721,8 @@ bc(syntax)..
'(' <arg-name> <arg-type> ( ',' <arg-name> <arg-type> )* ')'
( CALLED | RETURNS NULL ) ON NULL INPUT
RETURNS <type>
( DETERMINISTIC )?
( MONOTONIC ( ON <arg-name> )? )?
LANGUAGE <language>
AS <body>
p.
Expand Down Expand Up @@ -766,6 +768,10 @@ If the optional @IF NOT EXISTS@ keywords are used, the function will only be cre

@OR REPLACE@ and @IF NOT EXIST@ cannot be used together.

The optional @DETERMINISTIC@ keyword specifies that the function is deterministic. This means that given a particular input, the function will always produce the same output.

The optional @MONOTONIC@ keyword specifies that the function is monotonic. This means that it is either entirely nonincreasing or nondecreasing. Even if the function is not monotonic on all its arguments, it is possible to specify that it is monotonic @ON@ one of its arguments, meaning that partial applications of the function over that argument will be monotonic. Monotonicity is required to use the function in a @GROUP BY@ clause.

Functions belong to a keyspace. If no keyspace is specified in @<function-name>@, the current keyspace is used (i.e. the keyspace specified using the "@USE@":#useStmt statement). It is not possible to create a user-defined function in one of the system keyspaces.

See the section on "user-defined functions":#udfs for more information.
Expand Down Expand Up @@ -806,6 +812,7 @@ bc(syntax)..
STYPE <state-type>
( FINALFUNC <final-functionname> )?
( INITCOND <init-cond> )?
( DETERMINISTIC )?
p.
__Sample:__

Expand All @@ -826,6 +833,8 @@ See the section on "user-defined aggregates":#udas for a complete example.

@OR REPLACE@ and @IF NOT EXIST@ cannot be used together.

The optional @DETERMINISTIC@ keyword specifies that the aggregate function is deterministic. This means that given a particular input, the function will always produce the same output.

Aggregates belong to a keyspace. If no keyspace is specified in @<aggregate-name>@, the current keyspace is used (i.e. the keyspace specified using the "@USE@":#useStmt statement). It is not possible to create a user-defined aggregate in one of the system keyspaces.

Signatures for user-defined aggregates follow the "same rules":#functionSignature as for user-defined functions.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ create_aggregate_statement ::= CREATE [ OR REPLACE ] AGGREGATE [ IF NOT EXISTS ]
STYPE cql_type:
[ FINALFUNC function_name]
[ INITCOND term ]
[ DETERMINISTIC ]
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
create_function_statement::= CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS]
function_name '(' arguments_declaration ')'
[ CALLED | RETURNS NULL ] ON NULL INPUT
RETURNS cql_type
RETURNS cql_type
[ DETERMINISTIC ]
[ MONOTONIC [ ON arg_name ] ]
LANGUAGE identifier
AS string arguments_declaration: identifier cql_type ( ',' identifier cql_type )*
18 changes: 18 additions & 0 deletions doc/modules/cassandra/pages/developing/cql/cql_singlefile.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1155,6 +1155,8 @@ CREATE FUNCTION akeyspace.fname IF NOT EXISTS +
( someArg int ) +
CALLED ON NULL INPUT +
RETURNS text +
( DETERMINISTIC )? +
( MONOTONIC ( ON <arg-name> )? )? +
LANGUAGE java +
AS $$ +
// some Java code +
Expand Down Expand Up @@ -1194,6 +1196,17 @@ exist.

`OR REPLACE` and `IF NOT EXIST` cannot be used together.

The optional `DETERMINISTIC` keyword specifies that the function is
deterministic. This means that given a particular input, the function
will always produce the same output.

The optional `MONOTONIC` keyword specifies that the function is monotonic.
This means that it is either entirely nonincreasing or nondecreasing.
Even if the function is not monotonic on all its arguments, it is possible
to specify that it is monotonic `ON` one of its arguments, meaning that
partial applications of the function over that argument will be monotonic.
Monotonicity is required to use the function in a `GROUP BY` clause.

Functions belong to a keyspace. If no keyspace is specified in
`<function-name>`, the current keyspace is used (i.e. the keyspace
specified using the link:#useStmt[`USE`] statement). It is not possible
Expand Down Expand Up @@ -1243,6 +1256,7 @@ SFUNC +
STYPE +
( FINALFUNC )? +
( INITCOND )? +
( DETERMINISTIC )? +
p. +
_Sample:_

Expand All @@ -1268,6 +1282,10 @@ creates an aggregate if it does not already exist.

`OR REPLACE` and `IF NOT EXIST` cannot be used together.

The optional `DETERMINISTIC` keyword specifies that the aggregate
function is deterministic. This means that given a particular input,
the function will always produce the same output.

Aggregates belong to a keyspace. If no keyspace is specified in
`<aggregate-name>`, the current keyspace is used (i.e. the keyspace
specified using the link:#useStmt[`USE`] statement). It is not possible
Expand Down
13 changes: 13 additions & 0 deletions doc/modules/cassandra/pages/developing/cql/functions.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,16 @@ If the optional `IF NOT EXISTS` keywords are used, the function will only be cre
exist.
`OR REPLACE` and `IF NOT EXISTS` cannot be used together.

The optional `DETERMINISTIC` keyword specifies that the aggregate function is deterministic.
This means that given a particular input, the function will always produce the same output.

The optional `MONOTONIC` keyword specifies that the function is monotonic.
This means that it is either entirely nonincreasing or nondecreasing.
Even if the function is not monotonic on all its arguments, it is possible
to specify that it is monotonic `ON` one of its arguments, meaning that
partial applications of the function over that argument will be monotonic.
Monotonicity is required to use the function in a `GROUP BY` clause.

Behavior for `null` input values must be defined for each function:

* `RETURNS NULL ON NULL INPUT` declares that the function will always return `null` if any of the input arguments is `null`.
Expand Down Expand Up @@ -577,6 +587,9 @@ A `CREATE AGGREGATE` without `OR REPLACE` fails if an aggregate with the same si
The `CREATE AGGREGATE` command with the optional `IF NOT EXISTS` keywords creates an aggregate if it does not already exist.
The `OR REPLACE` and `IF NOT EXISTS` phrases cannot be used together.

The optional `DETERMINISTIC` keyword specifies that the aggregate function is deterministic.
This means that given a particular input, the function will always produce the same output.

The `STYPE` value defines the type of the state value and must be specified.
The optional `INITCOND` defines the initial state value for the aggregate; the default value is `null`.
A non-null `INITCOND` must be specified for state functions that are declared with `RETURNS NULL ON NULL INPUT`.
Expand Down
3 changes: 3 additions & 0 deletions pylib/cqlshlib/cql3handling.py
Original file line number Diff line number Diff line change
Expand Up @@ -1338,6 +1338,8 @@ def create_cf_composite_primary_key_comma_completer(ctxt, cass):
")" )?
("RETURNS" "NULL" | "CALLED") "ON" "NULL" "INPUT"
"RETURNS" <storageType>
( "DETERMINISTIC" )?
( "MONOTONIC" ( "ON" <cident> )? )?
"LANGUAGE" <cident> "AS" <stringLiteral>
;
Expand All @@ -1351,6 +1353,7 @@ def create_cf_composite_primary_key_comma_completer(ctxt, cass):
"STYPE" <storageType>
( "FINALFUNC" <refUserFunctionName> )?
( "INITCOND" <term> )?
( "DETERMINISTIC" )?
;
'''
Expand Down
2 changes: 2 additions & 0 deletions src/antlr/Lexer.g
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,8 @@ K_INPUT: I N P U T;
K_LANGUAGE: L A N G U A G E;
K_OR: O R;
K_REPLACE: R E P L A C E;
K_DETERMINISTIC: D E T E R M I N I S T I C;
K_MONOTONIC: M O N O T O N I C;

K_JSON: J S O N;
K_DEFAULT: D E F A U L T;
Expand Down
18 changes: 16 additions & 2 deletions src/antlr/Parser.g
Original file line number Diff line number Diff line change
Expand Up @@ -663,6 +663,7 @@ createAggregateStatement returns [CreateAggregateStatement.Raw stmt]
@init {
boolean orReplace = false;
boolean ifNotExists = false;
boolean deterministic = false;

List<CQL3Type.Raw> argTypes = new ArrayList<>();
}
Expand All @@ -684,7 +685,8 @@ createAggregateStatement returns [CreateAggregateStatement.Raw stmt]
(
K_INITCOND ival = term
)?
{ $stmt = new CreateAggregateStatement.Raw(fn, argTypes, stype, sfunc, ffunc, ival, orReplace, ifNotExists); }
( K_DETERMINISTIC { deterministic = true; } )?
{ $stmt = new CreateAggregateStatement.Raw(fn, argTypes, stype, sfunc, ffunc, ival, orReplace, ifNotExists, deterministic); }
;

dropAggregateStatement returns [DropAggregateStatement.Raw stmt]
Expand Down Expand Up @@ -716,6 +718,10 @@ createFunctionStatement returns [CreateFunctionStatement.Raw stmt]
List<ColumnIdentifier> argNames = new ArrayList<>();
List<CQL3Type.Raw> argTypes = new ArrayList<>();
boolean calledOnNullInput = false;

boolean deterministic = false;
boolean monotonic = false;
List<ColumnIdentifier> monotonicOn = new ArrayList<>();
}
: K_CREATE (K_OR K_REPLACE { orReplace = true; })?
K_FUNCTION
Expand All @@ -729,10 +735,16 @@ createFunctionStatement returns [CreateFunctionStatement.Raw stmt]
')'
( (K_RETURNS K_NULL) | (K_CALLED { calledOnNullInput=true; })) K_ON K_NULL K_INPUT
K_RETURNS returnType = comparatorType
( K_DETERMINISTIC { deterministic = true; } )?
(
K_MONOTONIC { monotonic=true; monotonicOn.addAll(argNames); }
| K_MONOTONIC K_ON k=noncol_ident { monotonicOn.add(k); monotonic=monotonicOn.containsAll(argNames); }
)?
K_LANGUAGE language = IDENT
K_AS body = STRING_LITERAL
{ $stmt = new CreateFunctionStatement.Raw(
fn, argNames, argTypes, returnType, calledOnNullInput, $language.text.toLowerCase(), $body.text, orReplace, ifNotExists);
fn, argNames, argTypes, returnType, calledOnNullInput, $language.text.toLowerCase(), $body.text, orReplace,
ifNotExists, deterministic, monotonic, monotonicOn);
}
;

Expand Down Expand Up @@ -2075,5 +2087,7 @@ basic_unreserved_keyword returns [String str]
| K_RECORD
| K_ANN_OF
| K_OFFSET
| K_DETERMINISTIC
| K_MONOTONIC
) { $str = $k.text; }
;
5 changes: 5 additions & 0 deletions src/java/org/apache/cassandra/config/DatabaseDescriptor.java
Original file line number Diff line number Diff line change
Expand Up @@ -4216,6 +4216,11 @@ public static boolean enableUserDefinedFunctionsThreads()
return conf.user_defined_functions_threads_enabled;
}

public static void enableUserDefinedFunctionsThreads(boolean enabled)
{
conf.user_defined_functions_threads_enabled = enabled;
}

public static long getUserDefinedFunctionWarnTimeout()
{
return conf.user_defined_functions_warn_timeout.toMilliseconds();
Expand Down
4 changes: 2 additions & 2 deletions src/java/org/apache/cassandra/cql3/Lists.java
Original file line number Diff line number Diff line change
Expand Up @@ -285,9 +285,9 @@ public List<ByteBuffer> getElements()
}

/**
* Basically similar to a Value, but with some non-pure function (that need
* Basically similar to a Value, but with some non-deterministic function (that need
* to be evaluated at execution time) in it.
*
* </p>
* Note: this would also work for a list with bind markers, but we don't support
* that because 1) it's not excessively useful and 2) we wouldn't have a good
* column name to return in the ColumnSpecification for those markers (not a
Expand Down
16 changes: 8 additions & 8 deletions src/java/org/apache/cassandra/cql3/Term.java
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ public interface Term
* Whether or not that term contains at least one bind marker.
*
* Note that this is slightly different from being or not a NonTerminal,
* because calls to non pure functions will be NonTerminal (see #5616)
* because calls to non-deterministic functions will be NonTerminal (see #5616)
* even if they don't have bind markers.
*/
public abstract boolean containsBindMarker();
Expand Down Expand Up @@ -151,15 +151,15 @@ public abstract class MultiColumnRaw extends Term.Raw

/**
* A terminal term, one that can be reduced to a byte buffer directly.
*
* </p>
* This includes most terms that don't have a bind marker (an exception
* being delayed call for non pure function that are NonTerminal even
* being delayed call for non-deterministic function that are NonTerminal even
* if they don't have bind markers).
*
* </p>
* This can be only one of:
* - a constant value
* - a collection value
*
* </p>
* Note that a terminal term will always have been type checked, and thus
* consumer can (and should) assume so.
*/
Expand Down Expand Up @@ -212,14 +212,14 @@ public abstract class MultiItemTerminal extends Terminal
}

/**
* A non terminal term, i.e. a term that can only be reduce to a byte buffer
* A non-terminal term, i.e. a term that can only be reduce to a byte buffer
* at execution time.
*
* </p>
* We have the following type of NonTerminal:
* - marker for a constant value
* - marker for a collection value (list, set, map)
* - a function having bind marker
* - a non pure function (even if it doesn't have bind marker - see #5616)
* - a non-deterministic function (even if it doesn't have bind marker - see #5616)
*/
public abstract class NonTerminal implements Term
{
Expand Down
2 changes: 1 addition & 1 deletion src/java/org/apache/cassandra/cql3/Tuples.java
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ public List<ByteBuffer> getElements()
}

/**
* Similar to Value, but contains at least one NonTerminal, such as a non-pure functions or bind marker.
* Similar to Value, but contains at least one NonTerminal, such as a non-deterministic functions or bind marker.
*/
public static class DelayedValue extends Term.NonTerminal
{
Expand Down
2 changes: 1 addition & 1 deletion src/java/org/apache/cassandra/cql3/Vectors.java
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ public List<ByteBuffer> getElements()
}

/**
* Basically similar to a Value, but with some non-pure function (that need
* Basically similar to a Value, but with some non-deterministic function (that need
* to be evaluated at execution time) in it.
*/
public static class DelayedValue<T> extends Term.NonTerminal
Expand Down
27 changes: 14 additions & 13 deletions src/java/org/apache/cassandra/cql3/functions/Function.java
Original file line number Diff line number Diff line change
Expand Up @@ -34,44 +34,45 @@ public interface Function extends AssignmentTestable
* A marker buffer used to represent function parameters that cannot be resolved at some stage of CQL processing.
* This is used for partial function application in particular.
*/
public static final ByteBuffer UNRESOLVED = ByteBuffer.allocate(0);
ByteBuffer UNRESOLVED = ByteBuffer.allocate(0);

public FunctionName name();
public List<AbstractType<?>> argTypes();
public AbstractType<?> returnType();
FunctionName name();
List<AbstractType<?>> argTypes();
AbstractType<?> returnType();

/**
* Checks whether the function is a native/hard coded one or not.
*
* @return {@code true} if the function is a native/hard coded one, {@code false} otherwise.
*/
public boolean isNative();
boolean isNative();

/**
* Checks whether the function is a pure function (as in doesn't depend on, nor produces side effects) or not.
* Checks whether the function is a deterministic function (as in given a particular input, will always produce the
* same output) or not.
*
* @return {@code true} if the function is a pure function, {@code false} otherwise.
* @return {@code true} if the function is a deterministic function, {@code false} otherwise.
*/
public boolean isPure();
boolean isDeterministic();

/**
* Checks whether the function is an aggregate function or not.
*
* @return {@code true} if the function is an aggregate function, {@code false} otherwise.
*/
public boolean isAggregate();
boolean isAggregate();

public void addFunctionsTo(List<Function> functions);
void addFunctionsTo(List<Function> functions);

public boolean referencesUserType(ByteBuffer name);
boolean referencesUserType(ByteBuffer name);

/**
* Returns the name of the function to use within a ResultSet.
*
* @param columnNames the names of the columns used to call the function
* @return the name of the function to use within a ResultSet
*/
public String columnName(List<String> columnNames);
String columnName(List<String> columnNames);

/**
* Creates some new input arguments for this function.
Expand All @@ -81,7 +82,7 @@ public interface Function extends AssignmentTestable
*/
Arguments newArguments(ProtocolVersion version);

public default Optional<Difference> compare(Function other)
default Optional<Difference> compare(Function other)
{
throw new UnsupportedOperationException();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -186,10 +186,17 @@ protected URLConnection openConnection(URL u)

private static final Pattern patternJavaDriver = Pattern.compile("com\\.datastax\\.driver\\.core\\.");

JavaBasedUDFunction(FunctionName name, List<ColumnIdentifier> argNames, List<AbstractType<?>> argTypes,
AbstractType<?> returnType, boolean calledOnNullInput, String body)
JavaBasedUDFunction(FunctionName name,
List<ColumnIdentifier> argNames,
List<AbstractType<?>> argTypes,
AbstractType<?> returnType,
boolean calledOnNullInput,
String body,
boolean deterministic,
boolean monotonic,
List<ColumnIdentifier> monotonicOn)
{
super(name, argNames, argTypes, returnType, calledOnNullInput, "java", body);
super(name, argNames, argTypes, returnType, calledOnNullInput, "java", body, deterministic, monotonic, monotonicOn);

// put each UDF in a separate package to prevent cross-UDF code access
String pkgName = BASE_PACKAGE + '.' + generateClassName(name, 'p');
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ public final boolean isNative()
}

@Override
public boolean isPure()
public boolean isDeterministic()
{
// Most of our functions are pure, the other ones should override this
// Most of our functions are deterministic, the other ones should override this
return true;
}

Expand Down
Loading

0 comments on commit 9421d23

Please sign in to comment.