QuestDB vs TimescaleDB

This post comes from Kovid Rathee, who has put together a tutorial to show how to measure the ingestion rates and query performance of TimescaleDB and QuestDB databases using the time series benchmark suite. Thanks for the submission, Kovid!

What are time-series databases used for?#

In a connected world, billions of users are generating more data than ever. From human communication to the digital footprint we create, IoT sensors have become ubiquitous, and financial transactions are completely digitized. We have an explosion of the volume of time-centric data, and we are struggling to keep up with it all. Times-series databases are on the rise and OSS projects like QuestDB, InfluxDB, TimescaleDB, and cloud-based solutions like Amazon Timestream, etc., are in higher demand than ever. Time-series databases have officially come of age.

All these products are competing for more space in the time-series domain, and in doing that, they're making each other better. This article will look at two major times-eries databases and compare them using an open-source benchmarking tool called TSBS — Time Series Benchmarking Suite. This benchmarking suite is based on the testing scripts originally developed at InfluxDB, later enhanced by other major time-series databases, and currently maintained by TimescaleDB.

What is the Time Series Benchmark Suite (TSBS)?#

For traditional databases like MySQL and PostgreSQL, many popular options like HammerDB and sysbench are standard tools to measure database read and write performance. Similar tools exist for different types of databases. Performance testing makes sense when the benchmarking tool simulates real-life scenarios by creating realistic bursts and reading streams.

The access and usage pattern for time-series databases is very different from a traditional database — that is why we need a tool like TSBS. TSBS currently supports two kinds of loads:

IoT emulates the IoT data generated from the sensors of a trucking company. Imagine tracking a trucking fleet with real-time diagnostic data from every truck in your fleet.
DevOps simulates data usage generated by a server that tracks memory usage, CPU usage, disk usage, and so on. Imagine looking at the Grafana dashboard with these metrics and getting alerts on breaches.

Tutorial prerequisites#

For this tutorial, you'll need to have the following tools installed and available:

Docker should be available on the machine to run QuestDB and TimescaleDB
Go must be installed to run the benchmark tool

Note: This benchmark run was completed on a 16-core Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz with 128 GB RAM on AWS EC2.

Using the TSBS to test time series database performance#

To get started with the TSBS suite, clone the repository and prepare the tool:

# TSBS - create a temporary directory for the Go binaries
mkdir -p ~/tmp/go/src/github.com/timescale/
cd ~/tmp/go/src/github.com/timescale/

# Clone the TSBS repository, build test and install Go binaries:
git clone git@github.com:questdb/tsbs.git
cd ~/tmp/go/src/github.com/timescale/tsbs/ && git checkout questdb-tsbs-load-new
GOPATH=~/tmp/go go build -v ./...
GOPATH=~/tmp/go go test -v github.com/timescale/tsbs/cmd/tsbs_load_questdb
GOPATH=~/tmp/go go install -v ./...

Once all the suite is installed, QuestDB and TimescaleDB can be started:

docker run -p 9000:9000 -p 9009:9009 questdb/questdb:6.0.4

docker run -d --name timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=password \
  timescale/timescaledb:latest-pg12

We will test the performance of these two databases in four phases:

Generate DevOps data for one day, where nine different metrics are collected every 10 seconds for 200 devices. The data will be generated separately for QuestDB and TimescaleDB based on their respective formats. Use the tsbs_generate_data utility for this.
Load the generated data using the tsbs_load_questdb and tsbs_load utilities to load data into QuestDB and TimescaleDB, respectively. This allows us to test the ingestion and write speeds of each system.
Generate queries to run on the loaded data for QuestDB and TimescaleDB. Use the tsbs_generate_queries utility for this.
Execute generated queries on QuestDB and TimescaleDB using tsbs_run_queries_questdb and tsbs_run_queries_timescaledb respectively.

Let's go through the scripts for each of these steps, one by one.

Generate test data for benchmarking#

For the scope of this tutorial, we'll limit the scale of the benchmark run to 200 devices. As mentioned above, the data will be generated for one day, tracking nine metrics for every one of the hundred devices every 10 seconds. Using the tsbs_generate_data command, you can generate test data for any of the supported databases and use cases.

The data generated can occupy a lot of space. You can scale up or down based on your benchmarking requirements and availability of disk space.

# Navigate to the script library
cd ~/tmp/go/bin/

# Run the following command to generate data for QuestDB
./tsbs_generate_data \
--use-case="devops" --seed=123 --scale=200 \
--timestamp-start="2016-01-01T00:00:00Z" \
--timestamp-end="2016-01-02T00:00:00Z" \
--log-interval="10s" --format="influx" > /tmp/questdb-data

# Run the following command to generate data for TimescaleDB
./tsbs_generate_data \
   --use-case="devops" --seed=123 --scale=200 \
   --timestamp-start="2016-01-01T00:00:00Z" \
   --timestamp-end="2016-01-02T00:00:00Z" \
   --log-interval="10s" --format="timescaledb" > /tmp/timescaledb-data

The files generated for the data will be of different sizes because of the different formats used by TimescaleDB and QuestDB. QuestDB uses Influx Line Protocol format which is much lighter than any other format out there.

Load generated data to test write performance#

Loading the data is even simpler than generating it. For TimescaleDB, you can use the common utility tsbs_load. For QuestDB, you can use the tsbs_load_questdb utility as it supports some QuestDB-specific flags like --ilp-bind-to for InfluxDB Line Protocol binding port and --url signifying QuestDB's REST endpoint. You can use the following commands to load the data into TimescaleDB and QuestDB, respectively:

Please follow the instructions for TimescaleDB's config.yaml file.

# Go to the following directory
cd ~/tmp/go/bin

# Create the a config file
./tsbs_load config --target=timescaledb --data-source=FILE

# Edit the config.yaml file
# 1. Change the source file name to /tmp/timescaledb-data
# 2. Change the password for TimescaleDB to password
# 3. Change the number of workers to 8
./tsbs_load load timescaledb --config=./config.yaml

# Load data into QuestDB
./tsbs_load_questdb --file /tmp/questdb-data --workers 8

To get a better idea of load performance, you can try changing the --workers parameter. Please ensure that the benchmarking parameters and conditions for both databases are the same so that you get a fair comparison.

TimescaleDB Load/Write Performance:

./tsbs_load load timescaledb --config=./config.yaml
#...
Summary:
loaded 174528000 metrics in 59.376sec with 8 workers (mean rate 2939386.10 metrics/sec)
loaded 15552000 rows in 59.376sec with 8 workers (mean rate 261925.49 rows/sec)

QuestDB Load/Write Performance:

./tsbs_load_questdb --file /tmp/questdb-data --workers 8
#...
Summary:
loaded 174528000 metrics in 18.223sec with 8 workers (mean rate 9577373.06 metrics/sec)
loaded 15552000 rows in 18.223sec with 8 workers (mean rate 853429.28 rows/sec)

The write performance of QuestDB with eight workers, in this case, is ~3.2x faster than TimescaleDB. For the complete output of this benchmark run, see the accompanying GitHub repository.

Generate queries for reads#

The data set TSBS has generated for both QuestDB and TimescaleDB contains metrics for 200 hosts. To query all the readings where one metric is above a threshold across all hosts, we will use the query type high-cpu-all. To generate 1000 queries with different time ranges during that one day, you need to run the following commands:

# Go to the following directory
cd ~/tmp/go/bin

# Generate QuestDB queries for query-type high-cpu-all
./tsbs_generate_queries --use-case="devops" --seed=123 --scale=200 \
    --timestamp-start="2016-01-01T00:00:00Z" \
    --timestamp-end="2016-01-02T00:00:01Z" \
    --queries=1000 --query-type="high-cpu-all" --format="questdb" \
    > /tmp/queries_questdb-high-cpu-all

# Generate TimescaleDB queries for query-type high-cpu-all
./tsbs_generate_queries --use-case="devops" --seed=123 --scale=200 \
    --timestamp-start="2016-01-01T00:00:00Z" \
    --timestamp-end="2016-01-02T00:00:01Z" \
    --queries=1000 --query-type="high-cpu-all" --format="timescaledb" \
    > /tmp/timescaledb-queries-high-cpu-all

In this tutorial, we're running just one type of read query. You can choose from the different types of queries you can run to test the read performance:

Example queries

# TimescaleDB Query
SELECT * FROM cpu WHERE usage_user > 90.0
   AND time >= '2016-01-01 07:47:52.646325 +0000'
   AND time < '2016-01-01 19:47:52.646325 +0000';

# QuestDB Query
SELECT * FROM cpu WHERE usage_user > 90.0
   AND timestamp >= '2016-01-01T07:47:52Z'
   AND timestamp < '2016-01-01T19:47:52Z';

Execute queries to benchmark read performance#

Now that we've generated the data, loaded it into QuestDB and TimescaleDB, and also generated the benchmarking queries that we want to run, we can finally perform the read performance benchmark using the following commands:

Ensure that the queries have been generated properly before running the commands. To do that, you can run less /tmp/timescaledb-queries-high-cpu-all or less /tmp/queries_questdb-high-cpu-all.

# Go to the following directory
cd ~/tmp/go/bin/

# Run the read benchmarking queries for TimescaleDB
cat /tmp/timescaledb-queries-high-cpu-all | ./tsbs_run_queries_timescaledb --workers=8 \
  --postgres="host=localhost user=postgres password=password sslmode=disable" --print-interval 500

# Run the read benchmarking queries for QuestDB
./tsbs_run_queries_questdb --file /tmp/queries_questdb-high-cpu-all --workers=8 --print-interval 500

TimescaleDB read performance:

Run complete after 1000 queries with 8 workers (Overall query rate 25.78 queries/sec):
TimescaleDB CPU over threshold, all hosts:
min: 222.49ms, med: 274.94ms, mean: 308.60ms, max: 580.13ms, stddev: 73.70ms, sum: 308.6sec, count: 1000
all queries :
min: 222.49ms, med: 274.94ms, mean: 308.60ms, max: 580.13ms, stddev: 73.70ms, sum: 308.6sec, count: 1000
wall clock time: 38.827242sec

QuestDB read performance:

Run complete after 1000 queries with 8 workers (Overall query rate 70.18 queries/sec):
QuestDB CPU over threshold, all hosts:
min: 92.71ms, med: 109.10ms, mean: 113.32ms, max: 382.57ms, stddev: 19.34ms, sum: 113.3sec, count: 1000
all queries :
min: 92.71ms, med: 109.10ms, mean: 113.32ms, max: 382.57ms, stddev: 19.34ms, sum: 113.3sec, count: 1000
wall clock time: 14.275811sec

QuestDB executed the queries ~2.7x faster than TimescaleDB. QuestDB vs. TimescaleDB

To summarize the read and write benchmarks results, we can say that QuestDB is significantly faster in writes than TimescaleDB and considerably faster in reads. When we talk about read performance, concluding with just one type of query is probably not fair, which is why you can try running the TSBS suite on your own for different types of queries for both these databases. Here's the summary:

QuestDB performed ~320 % faster than TimescaleDB for write/load workloads.
QuestDB performed ~270 % faster than TimescaleDB for read/analytical workloads.

Conclusion#

TSBS is the de facto standard for benchmarking time-series databases. In this tutorial, you learned how to use TSBS to compare two time-series databases by easily generating test data and emulating realistic read and write loads. As mentioned before, TSBS currently supports test data generation for DevOps and IoT (vehicle diagnostics). You can create your test data generation scripts to create more use cases for, say, real-time weather tracking, traffic signals, financial markets, and so on.

If you like this content, we'd love to know your thoughts! Feel free to share your feedback or come and say hello in the QuestDB Community Slack.