#
Introduction: Choosing Apache Kafka and QuestDB to stream time-series dataApache Kafka is a battle-tested distributed stream-processing platform popular in the financial industry to handle mission-critical transactional workloads. Kafka's ability to handle large volumes of real-time market data makes it a core infrastructure component for trading, risk management, and fraud detection. Financial institutions use Kafka to stream data from market data feeds, transaction data, and other external sources to drive decisions.
A common data pipeline to ingest and store financial data involves publishing real-time data to Kafka and utilizing Kafka Connect to stream that to databases. For example, the market data team may continuously update real-time quotes for a security to Kafka, and the trading team may consume that data to make buy/sell orders. Processed market data and orders may then be saved to a time series database for further analysis.
In this article, we'll create a sample data pipeline to illustrate how this could work in practice. We will poll an external data source (FinnHub) for real-time quotes of stocks and ETFs, and publish that information to Kafka. Kafka Connect will then grab those records and publish it to a time series database (QuestDB) for analysis.

#
Prerequisites- Git
- Docker Engine: 20.10+
- Golang 1.19+
- FinnHub API Token
#
SetupTo run the example locally, first clone the repo
The codebase is organized into three parts:
- Golang code is located at the root of the repo
- Dockerfile for the Kafka Connect QuestDB image and the Docker Compose YAML file is under docker
- JSON files for Kafka Connect sinks are under
kafka-connect-sinks
#
Building the Kafka Connect QuestDB ImageWe first need to build the Kafka Connect docker image with QuestDB Sink
connector. Navigate to the docker
directory and run docker-compose build
.
The Dockerfile is simply installing the Kafka QuestDB Connector via Confluent Hub on top of the Confluent Kafka Connect base image:
#
Start Kafka, Kafka Connect, QuestDBNext, we will set up the infrastructure via Docker Compose. From the same
docker
directory, run Docker Compose in the background:
This will start Kafka + Zookeeper, our custom Kafka Connect image with the QuestDB Connector installed, as well as QuestDB. The full content of the Docker Compose file is as follows:
#
Start the QuestDB Kafka Connect SinkWait for the Docker containers to be healthy (the kafka-connect image will log "Finished starting connectors and tasks" message), and we can create our Kafka Connect sinks. We will create two sinks: one for Tesla and one for SPY (SPDR S&P 500 ETF) to compare price trends of a volatile stock and the overall market.
Issue the following curl command to create the Tesla sink within the
kafka-connect-sinks
directory:
The JSON file it posts contains the following configurations.
Create the sink for SPY as well:
#
Streaming real-time stock quotes with Apache Kafka and QuestDBNow that we have our data pipeline set up, we are ready to stream real time stock quotes to Kafka and store them in QuestDB.
First, we need to get a free API token from Finnhub Stock API. Create a free account online and copy the API key.
Export that key to our shell under FINNHUB_TOKEN
:
The realtime quote endpoint returns various attributes such as the current price, high/low/open quotes, as well as previous close price. Since we are just interested in the current price, we only grab the price and add the ticket symbol and timestamp to the Kafka JSON message.
The code below will grab the quote every 30 seconds and publish to the Kafka
topic: topic_TSLA
.
To start streaming the data, run the code:
To also get data for SPY, open up another terminal window, modify the code for the symbol to SPY and run the code as well with the token value set.
#
ResultAfter running the producer code, it will print out messages that it sends to
Kafka like:
Message published to Kafka: {"symbol":"TSLA","price":174.48,"timestamp":1678743220215}
.
This data is sent to the Kafka topic topic_TSLA and sent to QuestDB via the
Kafka Connect sink.
We can then navigate to localhost:9000
to access the QuestDB console.
Searching for all records in the topic_TSLA table, we can see our real-time
market quotes:
We can also look at SPY data from topic_SPY
:
With the data now in QuestDB, we can query for aggregate information by getting the average price over 2m window:
#
ConclusionKafka is a trusted component of data pipelines handling large amounts of time series data such as financial data. Kafka can be used to stream mission-critical source data to multiple destinations, including time series databases suited for real-time analytics.
In this article, we created a reference implementation of how to poll real-time market data and use Kafka to stream that to QuestDB via Kafka Connect. For more information on the QuestDB Kafka connector, check out the overview page on the QuestDB website. It lists more information on the configuration details and FAQs on setting it up. The GitHub repo for the connector also has sample projects including a Node.js and a Java example for those looking to extend this reference architecture.
#
Additional resourcesApache Kafka Connector for QuestDB
Realtime crypto tracker with QuestDB Kafka Connector
Real-time analytics and anomaly detection with Apache Kafka, Apache Flink, Grafana & QuestDB