ILP Overview

QuestDB implements the InfluxDB line protocol to ingest data. QuestDB can listen for line protocol packets over TCP.

This page aims to provide examples for QuestDB experts setting up TCP without any client libraries, or those looking to implement a new client library yourself.

tip

For general QuestDB users, client libraries are available for a number of languages: ILP client libraries.

TCP receiver overview#

The TCP receiver is a high-throughput ingestion-only API for QuestDB. Here are some key facts about the service:

  • ingestion only, there is no query capability
  • accepts plain text input in a form of InfluxDB Line Protocol
  • implicit transactions, batching
  • supports automatic table and column creation
  • multi-threaded, non-blocking
  • supports authentication
  • encryption requires an optional external reverse-proxy

By default, QuestDB listens over TCP on 0.0.0.0:9009. The receiver consists of two thread pools, which is an important design feature to be aware of to configure the receiver for maximum performance. The io worker threads are responsible for parsing text input. The writer threads are responsible for persisting data in tables. We will talk more about these in capacity planning section.

Authentication#

Although the original protocol does not support it, we have added authentication over TCP. This works by using an elliptic curve P-256 JSON Web Token (JWT) to sign a server challenge. Details of authentication over ILP can be found in the authentication documentation.

Configuration reference#

The TCP receiver configuration can be completely customized using configuration keys. You can use this to configure the thread pools, buffer and queue sizes, receiver IP address and port, load balancing, etc.

Usage#

This section provides usage information and details for data ingestion via ILP.

We provide examples in a number of programming languages. See our

Syntax#

table_name,symbolset columnset timestamp\n
ElementDefinition
table_nameName of the table where QuestDB will write data.
symbolsetA set of comma-separated name=value pairs that will be parsed as symbol columns.
columnsetA set of comma-separated name=value pairs that will be parsed as non-symbol columns.
timestampUNIX timestamp. The default unit is nanosecond and is configurable via line.tcp.timestamp. The value will be truncated to microsecond resolution when parsed by QuestDB.

name in the name=value pair always corresponds to column name in the table.

note

Each ILP message has to end with a new line \n character.

Behavior#

  • When the table_name does not correspond to an existing table, QuestDB will create the table on the fly using the name provided. Column types will be automatically recognized and assigned based on the data.
  • The timestamp column is automatically created as designated timestamp with the partition strategy set to DAY. Alternatively, use CREATE TABLE to create the table with a different partition strategy before ingestion.
  • When the timestamp is empty, QuestDB will use the server timestamp.

Difference from InfluxDB#

QuestDB TCP Receiver uses ILP as both serialization and the transport format. InfluxDB on other hand uses HTTP as the transport and ILP as serialization format. For this reason the existing InfluxDB client libraries will not work with QuestDB.

Generic example#

Let's assume the following data:

timestampcitytemperaturehumiditymake
1465839830100400000London23.50.343Omron
1465839830100600000Bristol23.20.443Honeywell
1465839830100700000London23.60.358Omron

The line protocol syntax for that table is:

readings,city=London,make=Omron temperature=23.5,humidity=0.343 1465839830100400000\n
readings,city=Bristol,make=Honeywell temperature=23.2,humidity=0.443 1465839830100600000\n
readings,city=London,make=Omron temperature=23.6,humidity=0.348 1465839830100700000\n

This would create table similar to this SQL statement and populate it.

CREATE TABLE readings (
timestamp TIMESTAMP,
city SYMBOL,
temperature DOUBLE,
humidity DOUBLE,
make SYMBOL
) TIMESTAMP(timestamp) PARTITION BY DAY;

Designated timestamp#

Timestamps#

Designated timestamp is the trailing value of an ILP message. It is optional, and when present, is a timestamp in Epoch nanoseconds. When the timestamp is omitted, the server will insert each message using the system clock as the row timestamp. See cairo.timestamp.locale and line.tcp.timestamp configuration options.

caution
  • While columnset timestamp type units are microseconds, the designated timestamp units are nanoseconds by default, and can be overridden via the line.tcp.timestamp configuration property.

  • The native timestamp format used by QuestDB is a Unix timestamp in microsecond resolution; timestamps in nanoseconds will be parsed and truncated to microseconds.

Example of ILP message with desginated timestamp value
tracking,loc=north val=200i 1000000000\n
Example of ILP message sans timestamp
tracking,loc=north val=200i\n
note

We recommend populating designated timestamp via trailing value syntax above.

It is also possible to populate designated timestamp via columnset. Please see mixed timestamp reference.

Irregularly-structured data#

InfluxDB line protocol makes it possible to send data under different shapes. Each new entry may contain certain tags or fields, and others not. QuestDB supports on-the-fly data structure changes with minimal overhead. Whilst the example just above highlights structured data, it is possible for InfluxDB line protocol users to send data as follows:

readings,city=London temperature=23.2 1465839830100400000\n
readings,city=London temperature=23.6 1465839830100700000\n
readings,make=Honeywell temperature=23.2,humidity=0.443 1465839830100800000\n

This would result in the following table:

timestampcitytemperaturehumiditymake
1465839830100400000London23.5NULLNULL
1465839830100700000London23.6NULLNULL
1465839830100800000NULL23.20.358Honeywell
tip

Whilst we offer this function for flexibility, we recommend that users try to minimize structural changes to maintain operational simplicity.

Duplicate column names#

If line contains duplicate column names, the value stored in the table will be that from the first name=value pair on each line. For example:

trade,ticker=USD price=30,price=60 1638202821000000000\n

Price 30 is stored, 60 is ignored.

Name restrictions#

Both table name and column names are allowed to have spaces . These spaces have to be escaped with \. For example both of these are valid lines.

trade\ table,ticker=USD price=30,details="Latest price" 1638202821000000000\n
trade,symbol\ ticker=USD price=30,details="Latest price" 1638202821000000000\n

Table and column names must not contain any of the forbidden characters: \n,\r,?,,,:,",',\,/,\0,),(,+,*,~ and %.

Additionally, table name must not start or end with the . character. Column name must not contain . and -.

Symbolset#

Area of the message that contains comma-separated set of name=value pairs for symbol columns. For example in a message like this:

trade,ticker=BTCUSD,venue=coinbase price=30,price=60 1638202821000000000\n

symbolset is ticker=BTCUSD,venue=coinbase. Please note the mandatory space between symbolset and columnset. Naming rules for columns are subject to duplicate rules and name restrictions.

Symbolset values#

symbolset values are always interpreted as SYMBOL. Parser takes values literally so please beware of accidentally using high cardinality types such as 9092i or 1.245667. This will result in a significant performance loss due to large mapping tables.

symbolset values are not quoted. They are allowed to have special characters, such as (space), =, ,, \n, \r and \, which must be escaped with a \. Example:

trade,ticker=BTC\\USD\,All,venue=coin\ base price=30 1638202821000000000\n

Whenever symbolset column does not exist, it will be added on-the-fly with type SYMBOL. On other hand when the column does exist, it is expected to be of SYMBOL type, otherwise the line is rejected.

Columnset#

Area of the message that contains comma-separated set of name=value pairs for non-symbol columns. For example in a message like this:

trade,ticker=BTCUSD priceLow=30,priceHigh=60 1638202821000000000\n

columnset is priceLow=30,priceHigh=60. Naming rules for columns are subject to duplicate rules and name restrictions.

Columnset values#

columnset supports several values types, which are used to either derive type of new column or mapping strategy when column already exists. These types are limited by existing Influx Line Protocol specification. Wider QuestDB type system is available by creating table via SQL upfront. The following are supported value types: Integer, Long256, Float, String and Timestamp

Inserting NULL values#

To insert a NULL value, skip the column (or symbol) for that row.

For example:

table1 a=10.5 1647357688714369403
table1 b=1.25 1647357698714369403

Will insert as:

abtimestamp
10.5NULL2022-03-15T15:21:28.714369Z
NULL1.252022-03-15T15:21:38.714369Z

ILP Datatypes and Casts#

Strings vs Symbols#

Strings may be recorded as either the STRING type or the SYMBOL type.

Inspecting a sample ILP we can see how a space ' ' separator splits SYMBOL columns to the left from the rest of the columns.

table_name,col1=symbol_val1,col2=symbol_val2 col3="string val",col4=10.5
╰───────── separator

In this example, columns col1 and col2 are strings written to the database as SYMBOLs, whilst col3 is written out as a STRING.

SYMBOLs are strings which are automatically interned by the database on a per-column basis. You should use this type if you expect the string to be re-used over and over, such as is common with identifiers.

For one-off strings use STRING columns which aren't interned.

Casts#

QuestDB types are a superset of those supported by ILP. This means that when sending data you should be aware of the performed conversions.

See:

Constructing well-formed messages#

Different library implementations will perform different degrees of content validation upfront before sending messages out. To avoid encountering issues, follow these guidelines:

  • All strings must be UTF-8 encoded.

  • Columns should only appear once per row.

  • Symbol columns must be written out before other columns.

  • Table and column names can't have invalid characters. These should not contain ?, .,,, ', ", \, /, :, (, ), +, -, *, %, ~,' ' (space), \0 (nul terminator), ZERO WIDTH NO-BREAK SPACE.

  • Write timestamp column via designated API, or at the end of the message if you are using raw sockets. If you have multiple timestamp columns write additional ones as column values.

  • Don't change column type between rows.

  • Supply timestamps in order. These need to be at least equal to previous ones in the same table, unless using the out of order feature. This is not necessary if you use the out-of-order feature.

Error handling#

QuestDB will always log any ILP errors in its server logs.

It is recommended that sending applications reuse TCP connections. If QuestDB receives an invalid message, it will discard invalid lines, produce an error message in the logs and forcibly disconnect the sender to prevent further data loss.

Data may be discarded because of:

  • missing new line characters at the end of messages
  • an invalid data format such as unescaped special characters
  • invalid column / table name characters
  • schema mismatch with existing tables
  • message size overflows on the input buffer
  • system errors such as no space left on the disk

Detecting malformed input can be achieved through QuestDB logs by searching for LineTcpMeasurementScheduler and LineTcpConnectionContext, for example:

2022-02-03T11:01:51.007235Z I i.q.c.l.t.LineTcpMeasurementScheduler could not create table [tableName=trades, ex=`column name contains invalid characters [colName=trade_%]`, errno=0]

The following input is tolerated by QuestDB:

  • a column is specified twice or more on the same line, QuestDB will pick the first occurrence and ignore the rest
  • missing columns, their value will be defaulted to null/0.0/false depending on the type of the column
  • missing designated timestamp, the current server time will be used to generate the timestamp
  • the timestamp is specified as a column instead of appending it to the end of the line
  • timestamp appears as a column and is also present at the end of the line, the value sent as a field will be used

With sufficient client-side validation, the lack of errors to the client and confirmation isn't necessarily a concern: QuestDB will log out any issues and disconnect on error. The database will process any valid lines up to that point and insert rows.

To resume WAL table ingestion after recovery from errors, see ALTER TABLE RESUME WAL for more information.

If you don't immediately see data#

If you don't see your inserted data, this is usually down to one of two things:

  • You prepared the messages, but forgot to call .flush() or similar in your client library, so no data was sent.

  • The internal timers and buffers within QuestDB did not commit the data yet. For development (and development only), you may want to tweak configuration settings to commit data more frequently.

    server.conf
    cairo.max.uncommitted.rows=1

    Refer to ILP's commit strategy documentation for more on these configuration settings.