
This post comes from Kovid Rathee, who has put together a tutorial to show how to measure the ingestion rates and query performance of TimescaleDB and QuestDB databases using the time series benchmark suite. Thanks for the submission, Kovid!
#
What are time-series databases used for?In a connected world, billions of users are generating more data than ever. From human communication to the digital footprint we create, IoT sensors have become ubiquitous, and financial transactions are completely digitized. We have an explosion of the volume of time-centric data, and we are struggling to keep up with it all. Times-series databases are on the rise and OSS projects like QuestDB, InfluxDB, TimescaleDB, and cloud-based solutions like Amazon Timestream, etc., are in higher demand than ever. Time-series databases have officially come of age.
All these products are competing for more space in the time-series domain, and in doing that, they're making each other better. This article will look at two major times-eries databases and compare them using an open-source benchmarking tool called TSBS — Time Series Benchmarking Suite. This benchmarking suite is based on the testing scripts originally developed at InfluxDB, later enhanced by other major time-series databases, and currently maintained by TimescaleDB.
#
What is the Time Series Benchmark Suite (TSBS)?For traditional databases like MySQL and PostgreSQL, many popular options like HammerDB and sysbench are standard tools to measure database read and write performance. Similar tools exist for different types of databases. Performance testing makes sense when the benchmarking tool simulates real-life scenarios by creating realistic bursts and reading streams.
The access and usage pattern for time-series databases is very different from a traditional database — that is why we need a tool like TSBS. TSBS currently supports two kinds of loads:
IoT emulates the IoT data generated from the sensors of a trucking company. Imagine tracking a trucking fleet with real-time diagnostic data from every truck in your fleet.
DevOps simulates data usage generated by a server that tracks memory usage, CPU usage, disk usage, and so on. Imagine looking at the Grafana dashboard with these metrics and getting alerts on breaches.
#
Tutorial prerequisitesFor this tutorial, you'll need to have the following tools installed and available:
- Docker should be available on the machine to run QuestDB and TimescaleDB
- Go must be installed to run the benchmark tool
Note: This benchmark run was completed on a 16-core Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz with 128 GB RAM on AWS EC2.
#
Using the TSBS to test time series database performanceTo get started with the TSBS suite, clone the repository and prepare the tool:
Once all the suite is installed, QuestDB and TimescaleDB can be started:
We will test the performance of these two databases in four phases:
- Generate DevOps data for one day, where nine different metrics are
collected every 10 seconds for 200 devices. The data will be generated
separately for QuestDB and TimescaleDB based on their respective formats. Use
the
tsbs_generate_data
utility for this. - Load the generated data using the
tsbs_load_questdb
andtsbs_load
utilities to load data into QuestDB and TimescaleDB, respectively. This allows us to test the ingestion and write speeds of each system. - Generate queries to run on the loaded data for QuestDB and TimescaleDB.
Use the
tsbs_generate_queries
utility for this. - Execute generated queries on QuestDB and TimescaleDB using
tsbs_run_queries_questdb
andtsbs_run_queries_timescaledb
respectively.
Let's go through the scripts for each of these steps, one by one.
#
Generate test data for benchmarkingFor the scope of this tutorial, we'll limit the scale of the benchmark run to 200 devices. As mentioned above, the data will be generated for one day, tracking nine metrics for every one of the hundred devices every 10 seconds. Using the tsbs_generate_data command, you can generate test data for any of the supported databases and use cases.
The data generated can occupy a lot of space. You can scale up or down based on your benchmarking requirements and availability of disk space.
The files generated for the data will be of different sizes because of the different formats used by TimescaleDB and QuestDB. QuestDB uses Influx Line Protocol format which is much lighter than any other format out there.
#
Load generated data to test write performanceLoading the data is even simpler than generating it. For TimescaleDB, you can
use the common utility tsbs_load
. For QuestDB, you can use the
tsbs_load_questdb
utility as it supports some QuestDB-specific flags like
--ilp-bind-to
for InfluxDB Line Protocol binding port and --url
signifying
QuestDB's REST endpoint. You can use the following commands to load the data
into TimescaleDB and QuestDB, respectively:
Please follow the instructions for TimescaleDB's config.yaml file.
To get a better idea of load performance, you can try changing the --workers
parameter. Please ensure that the benchmarking parameters and conditions for
both databases are the same so that you get a fair comparison.
TimescaleDB Load/Write Performance:
QuestDB Load/Write Performance:
The write performance of QuestDB with eight workers, in this case, is ~3.2x faster than TimescaleDB. For the complete output of this benchmark run, see the accompanying GitHub repository.
#
Generate queries for readsThe data set TSBS has generated for both QuestDB and TimescaleDB contains
metrics for 200 hosts. To query all the readings where one metric is above a
threshold across all hosts, we will use the query type high-cpu-all
. To
generate 1000 queries with different time ranges during that one day, you need
to run the following commands:
In this tutorial, we're running just one type of read query. You can choose from the different types of queries you can run to test the read performance:
#
Execute queries to benchmark read performanceNow that we've generated the data, loaded it into QuestDB and TimescaleDB, and also generated the benchmarking queries that we want to run, we can finally perform the read performance benchmark using the following commands:
Ensure that the queries have been generated properly before running the
commands. To do that, you can run less /tmp/timescaledb-queries-high-cpu-all
or less /tmp/queries_questdb-high-cpu-all
.
TimescaleDB read performance:
QuestDB read performance:
QuestDB executed the queries ~2.7x faster than TimescaleDB. QuestDB vs. TimescaleDB
To summarize the read and write benchmarks results, we can say that QuestDB is significantly faster in writes than TimescaleDB and considerably faster in reads. When we talk about read performance, concluding with just one type of query is probably not fair, which is why you can try running the TSBS suite on your own for different types of queries for both these databases. Here's the summary:
QuestDB performed ~320 % faster than TimescaleDB for write/load workloads.
QuestDB performed ~270 % faster than TimescaleDB for read/analytical workloads.
#
ConclusionTSBS is the de facto standard for benchmarking time-series databases. In this tutorial, you learned how to use TSBS to compare two time-series databases by easily generating test data and emulating realistic read and write loads. As mentioned before, TSBS currently supports test data generation for DevOps and IoT (vehicle diagnostics). You can create your test data generation scripts to create more use cases for, say, real-time weather tracking, traffic signals, financial markets, and so on.
If you like this content, we'd love to know your thoughts! Feel free to share your feedback or come and say hello in the QuestDB Community Slack.