“Look Mumma”: Kafka running on OpenShift4 using Ceph Block Storage
What’s Kafka…?
Apache Kafka is a community distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged event streaming platform (Source)
In this weekend blog, we will go full hands-on and deploy Apache Kafka & Apache Zookeeper clusters using Strimzi operator on OpenShift 4. The persistent storage required by Kafka and Zookeeper clusters will be provisioned using Ceph which itself is running on OpenShift 4 orchestrated via the rook operator.
Prerequisite
- You must have a running instance of an OpenShift or k8s cluster. You can follow my previous blog to learn how to deploy OpenShift 4 cluster.
- OpenShift client (oc) or Kubernetes client (kubectl) should be configured to talk to OpenShift / K8S cluster.
Part — 1 of 2
In this part, we will deploy
- Ceph cluster on OpenShift using the rook operator
- Setup rook-ceph-block storage-class
- Configure rook-ceph-block storage-class as a default (for simplicity)
Step 1: Verify OpenShift cluster
oc get po
oc get ns
Step 2: Install Ceph cluster using rook operator
You can follow my previous blog to set up a Ceph cluster using rook operator on OpenShift.
For simplicity, the local development version of Ceph using filesystem directories as OSDs could also be configured using the rook operator. Here are point-to-point steps to set up a local rook-ceph environment.
git clone https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
oc login -u system:adminoc create -f scc.yaml
oc get sccoc create -f operator.yaml
oc get po -n rook-ceph-systemoc create -f cluster.yaml
oc get po -n rook-cephoc create -f toolbox.yaml
oc get po -n rook-ceph
oc -n rook-ceph exec -it rook-ceph-tools bash
Step 3: Provision Ceph Block Storage
Before Rook operator can start provisioning Ceph block storage, a StorageClass and its storage pool need to be created. This is needed for Kubernetes to interoperate with Rook for provisioning persistent volumes.
oc create -f storageclass.yaml
oc get sc
NAME PROVISIONER AGE
gp2 (default) kubernetes.io/aws-ebs 18h
rook-ceph-block ceph.rook.io/block 60m
Here you could see rook-ceph-block storage class has been created by rook operator and in the background, Ceph RBD pool has been provisioned
$ oc -n rook-ceph exec -it rook-ceph-tools-5bc8b8f97-jsx4j -- ceph dfGLOBAL:
SIZE AVAIL RAW USED %RAW USED
359 GiB 334 GiB 25 GiB 0
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 1 0 MiB 0.00 334 GiB 0
One last step is to make rook-ceph-block storage class as the default storage class. This needs to be done in two steps, first gp2 storage (if you are on AWS) needs to be demoted as default class and rook-ceph-block storage class should be promoted as default
- Option-1: Edit both storage classes and set is-default-class as false and true for gp2 and rook-ceph-block respectively.
oc edit storageclass gp2
oc edit storageclass rook-ceph-default
- Option-2 :
## set gp2 to false
oc patch storageclass gp2 -p '{"metadata":{"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"false"}}}'oc patch storageclass gp2 -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'## set rook-ceph-block to trueoc patch storageclass rook-ceph-block -p '{"metadata":{"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"true"}}}'
- Verify your new default storage class. PS: This is an optional step, was introduced for simplicity otherwise storage class needs to be supplied every time while provisioning PV.
$ oc get sc
NAME PROVISIONER AGE
gp2 kubernetes.io/aws-ebs 18h
rook-ceph-block (default) ceph.rook.io/block 63m
$
Part — 2 of 2
In this section, we will
- Deploy Kafka and zookeeper cluster using strimzi operator
- Both of these clusters will consume persistent storage from Ceph block storage
- Deploy Kafka producer and consumer application to verify the overall deployment.
Step 1: Deploy strimzi operator
- Create a new OpenShift project
oc login -u system:admin
oc new-project myproject
- Deploy strimzi cluster operator
$ oc apply -f https://github.com/strimzi/strimzi-kafka-operator/releases/download/0.11.1/strimzi-cluster-operator-0.11.1.yaml -n myprojectcustomresourcedefinition.apiextensions.k8s.io/kafkas.kafka.strimzi.io created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-entity-operator-delegation created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-topic-operator-delegation created
customresourcedefinition.apiextensions.k8s.io/kafkausers.kafka.strimzi.io created
clusterrole.rbac.authorization.k8s.io/strimzi-entity-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-global created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-kafka-broker-delegation created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-namespaced created
clusterrole.rbac.authorization.k8s.io/strimzi-topic-operator created
serviceaccount/strimzi-cluster-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-kafka-broker created
customresourcedefinition.apiextensions.k8s.io/kafkatopics.kafka.strimzi.io created
deployment.extensions/strimzi-cluster-operator created
customresourcedefinition.apiextensions.k8s.io/kafkaconnects2is.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkaconnects.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkamirrormakers.kafka.strimzi.io created
$
Step 2: Deploy Apache Kafka Cluster
$ oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/0.11.1/examples/kafka/kafka-persistent-single.yaml -n myprojectkafka.kafka.strimzi.io/my-cluster created
$$ oc get po
NAME READY STATUS RESTARTS AGE
my-cluster-entity-operator-5bdc9d897f-v5pcc 3/3 Running 0 46s
my-cluster-kafka-0 2/2 Running 0 117s
my-cluster-zookeeper-0 2/2 Running 0 2m56s
strimzi-cluster-operator-6954748598-r87gg 1/1 Running 0 11m
$
This will deploy a small persistent Apache Kafka Cluster with one node for each, Apache Zookeeper and Apache Kafka.
The persistent storage for both Apache Kafka and Apache Zookeeper is provisioned using Ceph block storage.
- Verify persistent volumes and persistent volume claims, you will notice that PV for both Apache Kafka and Apache Zookeeper clusters were provisioned using Ceph block storage class.
$ oc get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-2580dd3a-520b-11e9-9641-125cefd10aec 100Gi RWO Delete Bound myproject/data-my-cluster-zookeeper-0 rook-ceph-block 2m58s
pvc-482414fc-520b-11e9-9641-125cefd10aec 100Gi RWO Delete Bound myproject/data-my-cluster-kafka-0 rook-ceph-block 2m
$$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-my-cluster-kafka-0 Bound pvc-482414fc-520b-11e9-9641-125cefd10aec 100Gi RWO rook-ceph-block 2m5s
data-my-cluster-zookeeper-0 Bound pvc-2580dd3a-520b-11e9-9641-125cefd10aec 100Gi RWO rook-ceph-block 3m3s
$
Step 3: Deploying Kafka Producer
- Once the Kafka cluster is running, you can run a simple producer to send messages to Kafka topic (the topic will be automatically created)
$ oc run kafka-producer -it --image=strimzi/kafka:0.11.1-kafka-2.1.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap:9092 --topic my-topic
- (In another shell) verify Kafka producer pod showed up
$ oc get po
NAME READY STATUS RESTARTS AGE
kafka-producer 1/1 Running 0 20s
my-cluster-entity-operator-5bdc9d897f-v5pcc 3/3 Running 0 2m24s
my-cluster-kafka-0 2/2 Running 0 3m35s
my-cluster-zookeeper-0 2/2 Running 0 4m34s
strimzi-cluster-operator-6954748598-r87gg 1/1 Running 0 13m
$
- Once kafka-producer pod is initialized, you will notice Kafka producer console prompt, where you can submit any message that will get persistently stored in Kafka topic.
If you don't see a command prompt, try pressing enter.
>
>
>
> Look Maa ! I am Kafka Messages on Ceph Storage
> Maa : Well done my boy !!
Step 3: Deploying Kafka Consumer
- To read messages from Kafka topic, a consumer application needs to pull the messages.
- Deploy a Kafka-Consumer application and read the messages from the beginning
$ oc run kafka-consumer -it --image=strimzi/kafka:0.11.1-kafka-2.1.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
If you don't see a command prompt, try pressing enter.Look Maa ! I am Kafka Messages on Ceph Storage
Maa : Well done my boy !!
Step 4: Verify Kafka Persistence
- Destroy both Kafka-Producer and Kafka-Consumer applications that we created previously ( just Ctrl+c) and verify the pods are gone
$ oc get po
NAME READY STATUS RESTARTS AGE
my-cluster-entity-operator-5bdc9d897f-v5pcc 3/3 Running 0 7m40s
my-cluster-kafka-0 2/2 Running 0 8m51s
my-cluster-zookeeper-0 2/2 Running 0 9m50s
strimzi-cluster-operator-6954748598-r87gg 1/1 Running 0 18m
$
- The Kafka cluster uses Ceph Block Storage persistent storage to store messages in the topic.
- Let’s create another consumer application and subscribe to the topic created by the previous consumer and try to read messages from the beginning.
$ oc run kafka-consumer -it --image=strimzi/kafka:0.11.1-kafka-2.1.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
If you don't see a command prompt, try pressing enter.Look Maa ! I am Kafka Messages on Ceph Storage
Maa : Well done my boy !!
Cool 😎 , so we have learned to deploy Ceph cluster using Rook operator on OpenShift 4. Then we provisioned Ceph block storage via persistent volume and persistent volume claims using kubernetes storage class. We then deployed Apache Kafka and Apache Zookeeper cluster that used persistent volumes from Ceph block storage. Finally, we created Kafka producer and consumer apps to demonstrate Kafka messages are persistently getting stored on the Ceph cluster.
PS: Strimzi is an open-source project that simplifies running Apache Kafka clusters on OpenShift or Kubernetes clusters and relies on Operator framework. Check out the project documentation here.