Aggregate functions
This page describes the available functions to assist with performing aggregate calculations.
#
avgavg(value)
calculates simple average of values ignoring missing data (e.g
null
values).
#
Argumentsvalue
is any numeric value.
#
Return valueReturn value type is double
.
#
Examplesavg |
---|
22.4 |
payment_type | avg |
---|---|
cash | 22.1 |
card | 27.4 |
null | 18.02 |
#
countcount()
orcount(*)
- counts the number of rows irrespective of underlying data.count(column_name)
- counts the number of non-null values in a given column.
#
Argumentscount()
does not require arguments.count(column_name)
- supports the following data types:double
float
integer
character
short
byte
timestamp
date
long
long256
geohash
string
symbol
#
Return valueReturn value type is long
.
#
ExamplesCount of rows in the transactions
table:
count |
---|
100 |
Count of rows in the transactions
table aggregated by the payment_type
value:
payment_type | count |
---|---|
cash | 25 |
card | 70 |
null | 5 |
Count non-null transaction amounts:
count |
---|
95 |
Count non-null transaction amounts by payment_type
:
payment_type | count |
---|---|
cash | 24 |
card | 67 |
null | 4 |
note
null
values are aggregated with count()
, but not with count(column_name)
#
count_distinctcount_distinct(column_name)
- counts distinct values in string
, symbol
,
long256
, long
, or int
columns.
#
Return valueReturn value type is long
.
#
Examples- Count of distinct sides in the transactions table. Side column can either be
BUY
orSELL
ornull
count_distinct |
---|
2 |
- Count of distinct counterparties in the transactions table aggregated by
payment_type
value.
payment_type | count_distinct |
---|---|
cash | 3 |
card | 23 |
null | 5 |
note
null
values are not counted in the count_distinct
function.
#
first/lastfirst(column_name)
- returns the first value of a column.last(column_name)
- returns the last value of a column.
Supported column datatype: double
, float
, integer
, character
, short
,
byte
, timestamp
, date
, long
, geohash
.
If a table has a designated timestamp,
then the first row is always the row with the lowest timestamp (oldest) and the
last row is always the one with the highest (latest) timestamp. For a table
without a designated timestamp column, first
returns the first row and last
returns the last inserted row, regardless of any timestamp column.
#
Return valueReturn value type is string
.
#
ExamplesGiven a table sensors
, which has a designated timestamp column:
device_id | temperature | ts |
---|---|---|
arduino-01 | 12 | 2021-06-02T14:33:19.970258Z |
arduino-02 | 10 | 2021-06-02T14:33:21.703934Z |
arduino-03 | 18 | 2021-06-02T14:33:23.707013Z |
The following query returns oldest value for the device_id
column:
first |
---|
arduino-01 |
The following query returns the latest symbol value for the device_id
column:
last |
---|
arduino-03 |
Without selecting a designated timestamp column, the table may be unordered and
the query may return different result. Given an unordered table
sensors_unordered
:
device_id | temperature | ts |
---|---|---|
arduino-01 | 12 | 2021-06-02T14:33:19.970258Z |
arduino-03 | 18 | 2021-06-02T14:33:23.707013Z |
arduino-02 | 10 | 2021-06-02T14:33:21.703934Z |
The following query returns the first record for the device_id
column:
first |
---|
arduino-01 |
The following query returns the last record for the device_id
column:
last |
---|
arduino-02 |
#
haversine_dist_deghaversine_dist_deg(lat, lon, ts)
- calculates the traveled distance for a
series of latitude and longitude points.
#
Argumentslat
is the latitude expressed as degrees in decimal format (double
)lon
is the longitude expressed as degrees in decimal format (double
)ts
is thetimestamp
for the data point
#
Return valueReturn value type is double
.
#
Examples#
ksumksum(value)
- adds values ignoring missing data (e.g null
values). Values
are added using the
Kahan compensated sum algorithm.
This is only beneficial for floating-point values such as float
or double
.
#
Argumentsvalue
is any numeric value.
#
Return valueReturn value type is the same as the type of the argument.
#
Examplesksum |
---|
52.79143968514029 |
#
maxmax(value)
- returns the highest value ignoring missing data (e.g null
values).
#
Argumentsvalue
is any numeric value
#
Return valueReturn value type is the same as the type of the argument.
#
Examplesmax |
---|
55.3 |
payment_type | amount |
---|---|
cash | 31.5 |
card | 55.3 |
null | 29.2 |
#
minmin(value)
- returns the lowest value ignoring missing data (e.g null
values).
#
Argumentsvalue
is any numeric value
#
Return valueReturn value type is the same as the type of the argument.
#
Examplesmin |
---|
12.5 |
payment_type | min |
---|---|
cash | 12.5 |
card | 15.3 |
null | 22.2 |
#
nsumnsum(value)
- adds values ignoring missing data (e.g null
values). Values
are added using the
Neumaier sum algorithm.
This is only beneficial for floating-point values such as float
or double
.
#
Argumentsvalue
is any numeric value.
#
Return valueReturn value type is double
.
#
Examplesnsum |
---|
49.5442334742831 |
#
stddev_sampstddev_samp(value)
- calculates the sample standard deviation of values
ignoring missing data (e.g null
values).
#
Argumentsvalue
is any numeric value.
#
Return valueReturn value type is double
.
#
Examplesstddev_samp |
---|
29.011491975882 |
#
sumsum(value)
- adds values ignoring missing data (e.g null
values).
#
Argumentsvalue
is any numeric value.
#
Return valueReturn value type is the same as the type of the argument.
#
Examplessum |
---|
100 |
item | count |
---|---|
apple | 53 |
orange | 47 |
#
Overflowsum
does not perform overflow check. To avoid overflow, you can cast the
argument to wider type.