Exporting data from Apache Kafka (Red Hat AMQ Streams) topics to S3 using Apache Camel Connector

2 min readJun 16, 2020

At times, for your use case, you require long term persistence for your Apache Kafka data. This could be either to ingest Kafka messages data to your S3 data lake or simply storing messages for long term audit and compliance usage.

In this blog post, we will learn how to move Apache Kafka (Strimzi) messages to AWS S3 using Apache Camel connector.

Prerequisite

Running instance of OpenShift Container Platform
Running Strimzi Cluster Operator
Running instance of Red Hat AMQ Streams (Apache Kafka) deployed via Strimzi operator

Step: 1 Set S3 credentials as k8s secrets

Create a file aws-credentials.properties with the following contents

aws_access_key_id=<aws_s3_access_key>
aws_secret_access_key=<aws_s3_secret_key>

Create k8s secret from that file

oc create secret generic aws-credentials --from-file=aws-credentials.properties -n <kafka_namespace> 
oc get secret

Step: 2 Deploying Kafka Connect

Kafka Connect is a tool for streaming data between Apache Kafka and external systems. Using the concept of connectors, Kafka Connect provides a framework for moving large amounts of data into and out of your Kafka cluster while maintaining scalability and reliability. Kafka Connect is typically used to integrate Kafka with external databases and storage and messaging systems.

Procedure

Once we have Kafka cluster / Strimzi operator up and running in our OpenShift cluster, we need to deploy KafkaConnect. In the following YAML file, make sure to change name, bootstrap server as per your environment

oc project <kafka_namespace>
oc apply -f 01_kafka_connect.yaml
oc get kc
oc get deployments

Step: 3 Deploying Kafka Connector

KafkaConnectors allows you to create and manage connector instances for Kafka Connect in a Kubernetes-native way.

Prerequisites

A Kafka Connect deployment in which KafkaConnectors are enabled

Procedure

oc create -f 02_kafka_connector.yaml -n <kafka_namespace>

Make sure to change topics , camel.sink.url and other details as per your environment.

oc get kctr

Step: 4 Final Showdown

In the correct Kafka topic, you should now generate some messages
Head to your S3 web console and select the right bucket, you should be getting new objects which represent your Kafka messages. Download a few objects and check.

For more details on

Camel Kafka Connector visit project home
Strimzi visit project home

# To DO

IMO this connector should work well with Ceph S3 Object Storage, as its S3 compliant, however, this needs to be functionally tested.
Investigate how batching could be done on the camel-kafka-connector with S3 sync target
Test connector as S3 source to Kafka (would be interesting)

That’s all folks, happy data exporting !!!

Exporting data from Apache Kafka (Red Hat AMQ Streams) topics to S3 using Apache Camel Connector

# To DO

Written by Karan Singh

Responses (1)