Connecting Apache Kafka to Azure Event Hubs
Recently, I worked on an integration with Azure Event Hubs. A colleague of mine faced challenges while trying to export messages from an existing Kafka topic and import them into Event Hubs. To assist, I’ve documented the steps below, which you may find useful.
Step 1: Download and Extract Apache Kafka
Apache Kafka is an open-source, distributed event streaming platform. It facilitates the construction of distributed systems and ensures high throughput. You can download Apache Kafka from the following link: Apache Kafka Download
$ tar -xzf kafka_2.13-3.1.0.tgz
$ cd kafka_2.13-3.1.0
Step 2: Start the Kafka Environment
Ensure that Java 8 or higher is already installed in your local environment. If not, download and install it from Oracle’s website.
To start all services, execute the following commands:
Start the ZooKeeper service:
$ bin/zookeeper-server-start.sh config/zookeeper.properties
Start the Kafka broker:
$ bin/kafka-server-start.sh config/server.properties
Step 3: Create and Set Up Configuration Files
Create a new file named connector.properties
with the values below:
... (The content is mostly fine and technical, no changes)
Replace the placeholder values with those from your Azure endpoint. If you haven’t already, create a new namespace and deploy Event Hubs resources from the Azure portal. Note that you might need to select the Standard
pricing tier or higher to successfully create Kafka topics in the next step.
The required password can be found in the Shared access policies
settings of the Event Hub namespace, under the SAS Policy labeled RootManageSharedAccessKey
.
Step 4: Create Three Kafka Topics
To create the topics manually, use the kafka-topics
commands:
Create the configs
topic:
... (Commands are mostly fine and technical, no changes)
Create the offsets
topic:
... (Commands are mostly fine and technical, no changes)
Create the status
topic:
... (Commands are mostly fine and technical, no changes)
Step 5: Run Kafka Connect
Kafka Connect is a tool for reliably and scalably streaming data between Apache Kafka and Azure Event Hubs. To continuously import and export your data, start the worker locally in distributed mode.
$ bin/connect-distributed.sh path/to/connect-distributed.properties
With everything set up, you can proceed to test import and export functions.
Step 6: Create Input and Output Files
Create a directory and two files: one for seed data to be read by the FileStreamSource connector and another to be written to by the FileStreamSink connector.
$ mkdir ~/connect-demo
$ seq 1000 > ~/connect-demo/input.txt
$ touch ~/connect-demo/output.txt
Step 7: Create FileStreamSource Connector
Next, let me guide you through launching the FileStreamSource connector:
... (Commands are mostly fine and technical, no changes)
Step 8: Create FileStreamSink Connector
Similarly, let’s proceed to launch the FileStreamSink connector:
... (Commands are mostly fine and technical, no changes)
Finally, confirm that the data has been replicated between files and is identical.
cat ~/connect-demo/output.txt
You should see that the output.txt
file contains numbers from 1 to 1000, just like the input.txt
file. That’s it! If you update input.txt
, output.txt
will sync accordingly.
Please note that Azure Event Hubs’ support for the Kafka Connect API is still in public preview. The FileStreamSource and FileStreamSink connectors deployed are not intended for production use and should only be used for demonstration purposes.