Comparing InfluxDB and QuestDB Databases

Summary of benchmarking results of InfluxDB compared to QuestDB

Introduction to time-series databases#

Time-series data is emerging as the dominant type of data produced in IoT, financial services, manufacturing, DevOps, monitoring, and machine learning. Time-series databases are specialized for storing and analyzing this kind of data efficiently and quickly. This article compares QuestDB with InfluxDB to help understand the key differences so you can choose a time-series database that fits your use case.

The typical uses of time-series databases are for:

Application performance, uptime, and response times
Analyzing financial transactions and trades
Monitoring network logs
Asset tracking
E-commerce transaction data, sales insights, BI reports
Sensor data from IoT devices

While the traditional use of databases was to store the last-known state of a system, this only shows a snapshot of a point in time. In contrast, time-series data adds a historical dimension to help understand how information changes.

Understanding how data changes over time helps you predict your data, gain deeper insights into trends, and be better prepared for future variations. For this reason, developers need a time-series database that's robust, easy to use, and powerful. Importantly, it must be flexible enough to be used across different scenarios, industries, and use cases.

note

Post edited in May 2023 to reflect the latest updates

Introduction to InfluxDB and QuestDB#

Released in 2013, InfluxDB is an open-source time-series database subject to the MIT license. Both open-source versions of InfluxDB (v1 and v2) are implemented in Go.

QuestDB is an open-source time-series database licensed under Apache License 2.0. It is designed for high throughput ingestion and fast SQL queries. It supports schema-agnostic ingestion using the InfluxDB line protocol, PostgreSQL wire protocol, and a REST API for bulk imports and exports.

A brief comparison based on DB-Engines:

Name	InfluxDB	QuestDB
Primary database model	Time Series DBMS	Time Series DBMS
Implementation language	Go	Java (low latency, zero-GC), C++
SQL	SQL-like query languages (InfluxQL, Flux)	SQL
APIs and other access methods	HTTP API (including InfluxDB Line Protocol), JSON over UDP	InfluxDB Line Protocol, Postgres Wire Protocol, HTTP, JDBC

What are the data models used in InfluxDB and QuestDB?#

One of the first places to compare QuestDB and InfluxDB is how data is handled and stored in each database. InfluxDB has a dedicated line protocol message format for ingesting measurements. Each measurement has a timestamp, a set of tags (a tagset), and a set of fields (a fieldset):

measurementName,tagKey=tagValue fieldKey="fieldValue" 1465839830100399000
--------------- --------------- --------------------- -------------------
       |               |                  |                    |
  Measurement         Tags              Fields             Timestamp

In InfluxDB, tagset values are indexed strings, while fieldset values are not indexed. The data types that fields may use are limited to floats, integers, strings, and booleans. The following snippet is an example message in the InfluxDB line protocol:

sensors,location=london,version=REV-2.1 temperature=22,humidity=50 1465839830100399000\n

QuestDB supports the InfluxDB line protocol (ILP) for compatibility purposes, so inserts using the ILP supports the same data types. QuestDB supports additional numeric types, such as BYTE for 8-bit integers, FLOAT for 32-bit floats, LONG256 for larger integers, UUID for identifiers, and GEOHASH for geospatial data. Additional types can be used while ingesting InfluxDB line protocol by creating a table with a desired schema before writing data.

QuestDB also exposes PostgreSQL wire protocol and a REST API for inserts, allowing for more control over the data types that the system can handle, including additional types such as date, char, and binary. In QuestDB, it's also possible to add indexes to existing columns in tables, which can be done directly through SQL:

ALTER TABLE sensors ALTER COLUMN firmware ADD INDEX;

QuestDB has full support for relational queries, whereas InfluxDB is a NoSQL, non-relational database with a custom data model. QuestDB supports both schema-agnostic ingestions over InfluxDB line protocol and a relational data model. Users can leverage both paradigms and perform SQL JOINs to correlate "schemaless" data with relational data by timestamp.

A combination of schema-agnostic ingestion and relational data stored and queried in QuestDB — Data model comparison for InfluxDB and QuestDB

Comparing database storage models#

For storage, InfluxDB uses Time-Structured Merge (TSM) Trees, where data is stored in a columnar format, and the storage engine stores differences (or deltas) between values in a series. InfluxDB uses a time-series index for indexing to keep queries fast as cardinality grows. Still, the efficiency of this index has its limitations, explored in more detail below.

QuestDB uses columnar data structures but indexes data in vector-based append-only column files. Sorting of out-of-order data occurs in a staging area in memory, while sorted and persisted data is merged later using an append model. The main motivation for merging with persisted data in this way is to keep the storage engine performant on both read and write operations.

InfluxDB has a shard group concept as a partitioning strategy, allowing for grouping data by time. Users can provide a shard group duration which defines how large a shard will be and can enable common operations such as retention periods for data (deleting data older than X days, for example):

An illustration of shard groups in InfluxDB — Shard groups in InfluxDB

QuestDB has similar functionality to partition tables by time, and users may specify a partition size. When tables are partitioned by time, table metadata is defined once per table, and column files are partitioned at the filesystem level into directories per partition:

A diagram showing the column-based storage model of QuestDB — QuestDB's column-based storage model

Both QuestDB and InfluxDB have ways to partition data by time and employ a retention strategy. The difference in partitioning for each system is the broader terminology and the fact that QuestDB does not need to create separate TSM files on disk per partition.

Ease of use of SQL compared to custom query languages#

The emergence of NoSQL as a popular paradigm has effectively split databases into two categories: SQL and NoSQL.

InfluxDB originally started with a language similar to SQL called InfluxQL, which balanced some aspects of SQL with custom syntax. InfluxDB eventually adopted Flux as a query language to interact with data.

QuestDB embraces SQL as the primary query language so that there is no need for learning custom query languages. SQL is a good choice for time-series databases; it's easy to understand, most developers are familiar with it already, and SQL skills are simple to apply across different systems. As reported by the Stack Overflow developer survey of 2022, it's the third most popular language developers use. It proves to be a long-standing choice for quickly asking questions about the characteristics of your data.

Flux enables you to work with InfluxDB more efficiently, but it's difficult to read, and it's harder to learn and onboard new users. From users' perspective, learning a new custom query language is inconvenient for accessibility regardless of their engineering background.

from(bucket:"example-bucket")
  |> range(start:-1h)
  |> filter(fn:(r) =>
    r._measurement == "cpu" and
    r.cpu == "cpu-total"
  )
  |> aggregateWindow(every: 1m, fn: mean)

Consider that the Flux query above can be written in SQL as follows:

SELECT avg(cpu),
       avg(cpu-total)
FROM 'example-bucket'
WHERE timestamp > dateadd('h', -1, now())
SAMPLE BY 1m;

Benchmark#

Let's compare how QuestDB and InfluxDB operate in terms of performance for both ingestion and querying. As usual, we use the industry standard Time Series Benchmark Suite (TSBS) as the benchmark tool. Unfortunately, TSBS upstream does not support InfluxDB v2. Hence, we use our own TSBS fork.

The hardware we use for the benchmark is the following:

c6a.12xlarge EC2 instance with 48 vCPU and 96 GB RAM
500GB gp3 volume configured for the maximum settings (16,000 IOPS and 1,000 MB/s throughput)

The software side of the benchmark is the following:

Ubuntu 22.04
InfluxDB 2.7.1 with the default configuration
QuestDB 7.1.3 with the default configuration

Ingestion#

To benchmark ingestion, we used commands like the following one:

$ ./tsbs_generate_data --use-case="cpu-only" --seed=123 --scale=1000 --timestamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-15T00:00:00Z" --log-interval="10s" --format="influx" > /tmp/influx_data

$ ./tsbs_load_influx --db-name=benchmark --file=/tmp/influx_data --urls=http://localhost:8086 --auth-token="<auth_token>" --workers=10

Here, we use a cpu-only scenario with two days of CPU data for various numbers of simulated hosts. The number of hosts is set to 100, 1K, 100K, and 10M to demonstrate the effects of high data cardinality on ingestion performance. We use ten client connections for the benchmark in both cases.

The chart below summarizes the results for ingestion performance:

Ingestion results comparing QuestDB and InfluxDB using the time series benchmark suite — Ingestion rate (rows/sec) comparison between InfluxDB and QuestDB

QuestDB hits maximum ingestion throughput of 1.69M rows/sec versus 524K rows/sec for InfluxDB. For 10M unique devices, QuestDB sustains 663K rows/sec, versus 66K rows/sec for InfluxDB. The latter illustrates the effects of high cardinality on InfluxDB's ingestion performance.

Cardinality typically refers to the number of unique elements in a set's size. In a time-series database context, high cardinality boils down to many indexed columns in a table, each containing many unique values.

High cardinality is a known problem area for InfluxDB, and this is likely because of the system architecture and storage engine. InfluxDB stores each time series in its own Time-Structured Merge tree, a log-structured merge (LSM) tree-like persistent data structure, so reading and writing the data becomes more expensive when the total number of time series becomes high. In QuestDB, the storage model is completely different from LSM trees or B-trees and instead uses data stored in densely ordered vectors on disk. As such, QuestDB deals with high cardinality datasets better.

Query performance#

While QuestDB outperforms InfluxDB for ingestion, query performance is equally essential for time-series data analysis.

As part of the standard TSBS benchmark, we compare both databases for a few types of queries, which are popular for time-series data:

All queries were run for two weeks of 4K emulated host data. Here are some sample commands to generate and run the queries:

$ ./tsbs_generate_queries --use-case="cpu-only" --seed=123 --scale=4000 --timestamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-15T00:00:00Z" --queries=1000 --query-type="single-groupby-1-1-1" --format="influx" > /tmp/influx_query_single-groupby-1-1-1

$ ./tsbs_run_queries_influx --file=/tmp/influx_query_single-groupby-1-1-1 --auth-token="<auth_token>" --workers=10

Again, we use ten concurrent client connections for the comparison. We measure how much time each query takes in milliseconds:

Query performance results comparing QuestDB and InfluxDB using the time series benchmark suite — Query performance (milliseconds/query) comparison between InfluxDB and QuestDB

Two conclusions from the benchmark:

InfluxDB performs better for queries that access the data for a single or a few nodes, such as single-groupby-1-8-1 and cpu-max-all-1.
QuestDB outperforms InfluxDB significantly for more complicated queries touching the data for many or all nodes.

QuestDB's underperformance for queries on a single or a few nodes can be explained with index-based data access. Since there are many node measurements (think, time series) stored in the table, fetching values that belong to a concrete node assumes a random disk access pattern. Of course, this stands true only for so-called cold query execution, i.e. a situation when the data is not in the OS page cache. Once the data is in the cache, disk I/O doesn't matter anymore. We plan to improve cold execution for such queries significantly with the sub-partitioning feature. This way, multiple symbol column values (think, nodes) will be stored in smaller column files, leading to a sequential access pattern and, hence, better performance for queries accessing only a single or a few symbols.

For more complicated queries, QuestDB's better performance lies in its query engine: it takes advantage of a columnar data layout, SIMD instructions, and multi-threaded processing.

Ecosystem and support#

QuestDB is a younger database. However, QuestDB builds on the immensely popular SQL language and is compatible with the Postgres wire protocol. It also borrows from InfluxDB by implementing InfluxDB line protocol over TCP, a convenient option for fast data ingestion. In addition, QuestDB has a vibrant and supportive community: users get to interact with the engineering team building QuestDB.

InfluxDB has an impressive ecosystem and has been around for a long time. As an older product, InfluxDB is the most popular time-series database.

Conclusion#

We have compared InfluxDB and QuestDB, assessing performance and user experience. QuestDB ingests data 3-10 times faster, and many benchmarked queries perform better.

We believe in the simplicity of SQL to unlock insights from time-series data. Some of the SQL extensions that QuestDB offers, such as SAMPLE BY, can significantly reduce the complexity of queries and improve developer user experience.

If you are interested in how QuestDB performs compared to other OLAP and time-series databases, check out our benchmark articles.

Comparing InfluxDB and QuestDB Databases

Andrey Pechkurov

Introduction to time-series databases#

note

Introduction to InfluxDB and QuestDB#

What are the data models used in InfluxDB and QuestDB?#

Comparing database storage models#

Ease of use of SQL compared to custom query languages#

Benchmark#

Ingestion#

Query performance#

Ecosystem and support#

Conclusion#

Join our developer community

Subscribe to our newsletter