Exporting data from Apache Kafka (Red Hat AMQ Streams) topics to S3 using Apache Camel Connector

Karan Singh
2 min readJun 16, 2020

--

At times, for your use case, you require long term persistence for your Apache Kafka data. This could be either to ingest Kafka messages data to your S3 data lake or simply storing messages for long term audit and compliance usage.

In this blog post, we will learn how to move Apache Kafka (Strimzi) messages to AWS S3 using Apache Camel connector.

Prerequisite

  • Running instance of OpenShift Container Platform
  • Running Strimzi Cluster Operator
  • Running instance of Red Hat AMQ Streams (Apache Kafka) deployed via Strimzi operator

Step: 1 Set S3 credentials as k8s secrets

  • Create a file aws-credentials.properties with the following contents
aws_access_key_id=<aws_s3_access_key>
aws_secret_access_key=<aws_s3_secret_key>
  • Create k8s secret from that file
oc create secret generic aws-credentials --from-file=aws-credentials.properties -n <kafka_namespace> 
oc get secret

Step: 2 Deploying Kafka Connect

Kafka Connect is a tool for streaming data between Apache Kafka and external systems. Using the concept of connectors, Kafka Connect provides a framework for moving large amounts of data into and out of your Kafka cluster while maintaining scalability and reliability. Kafka Connect is typically used to integrate Kafka with external databases and storage and messaging systems.

Procedure

Once we have Kafka cluster / Strimzi operator up and running in our OpenShift cluster, we need to deploy KafkaConnect. In the following YAML file, make sure to change name, bootstrap server as per your environment

oc project <kafka_namespace>
oc apply -f 01_kafka_connect.yaml
oc get kc
oc get deployments

Step: 3 Deploying Kafka Connector

KafkaConnectors allows you to create and manage connector instances for Kafka Connect in a Kubernetes-native way.

Prerequisites

  • A Kafka Connect deployment in which KafkaConnectors are enabled

Procedure

oc create -f 02_kafka_connector.yaml -n <kafka_namespace>

Make sure to change topics , camel.sink.url and other details as per your environment.

oc get kctr

Step: 4 Final Showdown

  • In the correct Kafka topic, you should now generate some messages
  • Head to your S3 web console and select the right bucket, you should be getting new objects which represent your Kafka messages. Download a few objects and check.

For more details on

# To DO

  • IMO this connector should work well with Ceph S3 Object Storage, as its S3 compliant, however, this needs to be functionally tested.
  • Investigate how batching could be done on the camel-kafka-connector with S3 sync target
  • Test connector as S3 source to Kafka (would be interesting)

That’s all folks, happy data exporting !!!

--

--

Karan Singh
Karan Singh

Written by Karan Singh

Co-Founder & CTO @ Scogo AI ♦ I Love to solve problems using Tech

Responses (1)