Exporting data from Apache Kafka (Red Hat AMQ Streams) topics to S3 using Apache Camel Connector
At times, for your use case, you require long term persistence for your Apache Kafka data. This could be either to ingest Kafka messages data to your S3 data lake or simply storing messages for long term audit and compliance usage.
In this blog post, we will learn how to move Apache Kafka (Strimzi) messages to AWS S3 using Apache Camel connector.
Prerequisite
- Running instance of OpenShift Container Platform
- Running Strimzi Cluster Operator
- Running instance of Red Hat AMQ Streams (Apache Kafka) deployed via Strimzi operator
Step: 1 Set S3 credentials as k8s secrets
- Create a file
aws-credentials.properties
with the following contents
aws_access_key_id=<aws_s3_access_key>
aws_secret_access_key=<aws_s3_secret_key>
- Create k8s secret from that file
oc create secret generic aws-credentials --from-file=aws-credentials.properties -n <kafka_namespace>
oc get secret
Step: 2 Deploying Kafka Connect
Kafka Connect is a tool for streaming data between Apache Kafka and external systems. Using the concept of connectors, Kafka Connect provides a framework for moving large amounts of data into and out of your Kafka cluster while maintaining scalability and reliability. Kafka Connect is typically used to integrate Kafka with external databases and storage and messaging systems.
Procedure
Once we have Kafka cluster / Strimzi operator up and running in our OpenShift cluster, we need to deploy KafkaConnect
. In the following YAML file, make sure to change name
, bootstrap server
as per your environment
oc project <kafka_namespace>
oc apply -f 01_kafka_connect.yaml
oc get kc
oc get deployments
Step: 3 Deploying Kafka Connector
KafkaConnectors
allows you to create and manage connector instances for Kafka Connect in a Kubernetes-native way.
Prerequisites
- A
Kafka Connect
deployment in whichKafkaConnectors
are enabled
Procedure
oc create -f 02_kafka_connector.yaml -n <kafka_namespace>
Make sure to change topics
, camel.sink.url
and other details as per your environment.
oc get kctr
Step: 4 Final Showdown
- In the correct Kafka topic, you should now generate some messages
- Head to your S3 web console and select the right bucket, you should be getting new objects which represent your Kafka messages. Download a few objects and check.
For more details on
- Camel Kafka Connector visit project home
- Strimzi visit project home
# To DO
- IMO this connector should work well with Ceph S3 Object Storage, as its S3 compliant, however, this needs to be functionally tested.
- Investigate how batching could be done on the camel-kafka-connector with S3 sync target
- Test connector as S3 source to Kafka (would be interesting)
That’s all folks, happy data exporting !!!