Create a user-defined aggregate function (UDA)

DataStax Enterprise allows users to define aggregate functions that can be applied to data stored in a table as part of a query result. The aggregate function must be created prior to its use in a SELECT statement and the query must only include the aggregate function itself, but no columns. The state function is called once for each row, and the value returned by the state function becomes the new state. After all rows are processed, the optional final function is executed with the last state value as its argument. Aggregation is performed by the coordinator.

The example shown computes the team average for race time for all the cyclists stored in the table. The race time is computed in seconds.

Prerequisites

  • Keyspace exists

  • Function exists

    1. Create a state function, as a user-defined function (UDF), if needed. This function adds all the race times together and counts the number of entries.

      CREATE OR REPLACE FUNCTION cycling.average_state 
        (state tuple<int, bigint>, val int) 
        CALLED ON NULL INPUT 
        RETURNS tuple<int, bigint> 
        LANGUAGE java AS 
          $$
            if (val != null) { 
              state.setInt(0, state.getInt(0) + 1); 
              state.setLong(1, state.getLong(1) + val.intValue()); 
            } 
            return state; 
          $$
      ;
CREATE FUNCTION cycling.average_state(state tuple<int, bigint>, val int)
    CALLED ON NULL INPUT
    RETURNS tuple<int, bigint>
    LANGUAGE java
    AS $$
      if (val != null) { 
        state.setInt(0, state.getInt(0) + 1); 
        state.setLong(1, state.getLong(1) + val.intValue()); 
      } 
      return state; 
    $$;


CREATE FUNCTION cycling.average_state(state tuple<int, bigint>, val int)
    CALLED ON NULL INPUT
    RETURNS tuple<int, bigint>
    LANGUAGE java
    AS $$
      if (val != null) { 
        state.setInt(0, state.getInt(0) + 1); 
        state.setLong(1, state.getLong(1) + val.intValue()); 
      } 
      return state; 
    $$;
  1. Create a final function, as a user-defined function (UDF), if needed. This function computes the average of the values passed to it from the state function.

    CREATE OR REPLACE FUNCTION cycling.average_final (state tuple<int,bigint>) 
      CALLED ON NULL INPUT 
      RETURNS double 
      LANGUAGE java AS 
        $$
          double r = 0; 
          if (state.getInt(0) == 0) return null; 
          r = state.getLong(1); 
          r /= state.getInt(0); 
          return Double.valueOf(r);
        $$ ;
CREATE FUNCTION cycling.average_final(state tuple<int, bigint>)
    CALLED ON NULL INPUT
    RETURNS double
    LANGUAGE java
    AS $$
      double r = 0; 
      if (state.getInt(0) == 0) return null; 
      r = state.getLong(1); 
      r /= state.getInt(0); 
      return Double.valueOf(r);
    $$;


CREATE FUNCTION cycling.average_final(state tuple<int, bigint>)
    CALLED ON NULL INPUT
    RETURNS double
    LANGUAGE java
    AS $$
      double r = 0; 
      if (state.getInt(0) == 0) return null; 
      r = state.getLong(1); 
      r /= state.getInt(0); 
      return Double.valueOf(r);
    $$;
  1. Create the aggregate function using the state and final functions, and add an STYPE to define the data type for the function. Different STYPEs distinguishes between functions with the same name. An aggregate can be replaced with a different aggregate if OR REPLACE is used as shown in the examples above. Optionally, the IF NOT EXISTS keywords can be used to create the aggregate only if another aggregate with the same signature does not exist in the keyspace. OR REPLACE and IF NOT EXISTS cannot be used in the same command.

    CREATE OR REPLACE AGGREGATE cycling.average (int) 
      SFUNC average_state 
      STYPE tuple<int,bigint> 
      FINALFUNC average_final 
      INITCOND (0, 0);
CREATE AGGREGATE cycling.average(int)
    SFUNC average_state
    STYPE tuple<int, bigint>
    FINALFUNC average_final
    INITCOND (0, 0);


CREATE AGGREGATE cycling.average(int)
    SFUNC average_state
    STYPE tuple<int, bigint>
    FINALFUNC average_final
    INITCOND (0, 0);

See also:

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com