“Look Mumma”: Kafka running on OpenShift4 using Ceph Block Storage

6 min readMar 30, 2019

What’s Kafka…?

Apache Kafka is a community distributed event streaming platform capable of handling trillions of events a day. Initially conceived as a messaging queue, Kafka is based on an abstraction of a distributed commit log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full-fledged event streaming platform (Source)

In this weekend blog, we will go full hands-on and deploy Apache Kafka & Apache Zookeeper clusters using Strimzi operator on OpenShift 4. The persistent storage required by Kafka and Zookeeper clusters will be provisioned using Ceph which itself is running on OpenShift 4 orchestrated via the rook operator.

Prerequisite

You must have a running instance of an OpenShift or k8s cluster. You can follow my previous blog to learn how to deploy OpenShift 4 cluster.
OpenShift client (oc) or Kubernetes client (kubectl) should be configured to talk to OpenShift / K8S cluster.

Part — 1 of 2

In this part, we will deploy

Ceph cluster on OpenShift using the rook operator
Setup rook-ceph-block storage-class
Configure rook-ceph-block storage-class as a default (for simplicity)

Step 1: Verify OpenShift cluster

oc get po
oc get ns

Step 2: Install Ceph cluster using rook operator

You can follow my previous blog to set up a Ceph cluster using rook operator on OpenShift.

For simplicity, the local development version of Ceph using filesystem directories as OSDs could also be configured using the rook operator. Here are point-to-point steps to set up a local rook-ceph environment.

git clone https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
oc login -u system:adminoc create -f scc.yaml
oc get sccoc create -f operator.yaml
oc get po -n rook-ceph-systemoc create -f cluster.yaml
oc get po -n rook-cephoc create -f toolbox.yaml
oc get po -n rook-ceph
oc  -n rook-ceph exec -it rook-ceph-tools bash

Step 3: Provision Ceph Block Storage

Before Rook operator can start provisioning Ceph block storage, a StorageClass and its storage pool need to be created. This is needed for Kubernetes to interoperate with Rook for provisioning persistent volumes.

oc create -f storageclass.yaml
oc get sc
NAME              PROVISIONER             AGE
gp2 (default)     kubernetes.io/aws-ebs   18h
rook-ceph-block   ceph.rook.io/block      60m

Here you could see rook-ceph-block storage class has been created by rook operator and in the background, Ceph RBD pool has been provisioned

$ oc -n rook-ceph exec -it rook-ceph-tools-5bc8b8f97-jsx4j -- ceph dfGLOBAL:
    SIZE        AVAIL       RAW USED     %RAW USED
    359 GiB     334 GiB       25 GiB          0
POOLS:
    NAME     ID     USED        %USED     MAX AVAIL     OBJECTS
    rbd      1      0 MiB      0.00       334 GiB          0

One last step is to make rook-ceph-block storage class as the default storage class. This needs to be done in two steps, first gp2 storage (if you are on AWS) needs to be demoted as default class and rook-ceph-block storage class should be promoted as default

Option-1: Edit both storage classes and set is-default-class as false and true for gp2 and rook-ceph-block respectively.

oc edit storageclass gp2
oc edit storageclass rook-ceph-default

Option-2 :

## set gp2 to false
oc patch storageclass gp2 -p '{"metadata":{"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"false"}}}'oc patch storageclass gp2 -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'## set rook-ceph-block to trueoc patch storageclass rook-ceph-block -p '{"metadata":{"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"true"}}}'

Verify your new default storage class. PS: This is an optional step, was introduced for simplicity otherwise storage class needs to be supplied every time while provisioning PV.

$ oc get sc
NAME                        PROVISIONER             AGE
gp2                         kubernetes.io/aws-ebs   18h
rook-ceph-block (default)   ceph.rook.io/block      63m
$

Part — 2 of 2

In this section, we will

Deploy Kafka and zookeeper cluster using strimzi operator
Both of these clusters will consume persistent storage from Ceph block storage
Deploy Kafka producer and consumer application to verify the overall deployment.

Step 1: Deploy strimzi operator

Create a new OpenShift project

oc login -u system:admin
oc new-project myproject

Deploy strimzi cluster operator

$ oc apply -f https://github.com/strimzi/strimzi-kafka-operator/releases/download/0.11.1/strimzi-cluster-operator-0.11.1.yaml -n myprojectcustomresourcedefinition.apiextensions.k8s.io/kafkas.kafka.strimzi.io created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-entity-operator-delegation created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-topic-operator-delegation created
customresourcedefinition.apiextensions.k8s.io/kafkausers.kafka.strimzi.io created
clusterrole.rbac.authorization.k8s.io/strimzi-entity-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-global created
clusterrolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator-kafka-broker-delegation created
rolebinding.rbac.authorization.k8s.io/strimzi-cluster-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-cluster-operator-namespaced created
clusterrole.rbac.authorization.k8s.io/strimzi-topic-operator created
serviceaccount/strimzi-cluster-operator created
clusterrole.rbac.authorization.k8s.io/strimzi-kafka-broker created
customresourcedefinition.apiextensions.k8s.io/kafkatopics.kafka.strimzi.io created
deployment.extensions/strimzi-cluster-operator created
customresourcedefinition.apiextensions.k8s.io/kafkaconnects2is.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkaconnects.kafka.strimzi.io created
customresourcedefinition.apiextensions.k8s.io/kafkamirrormakers.kafka.strimzi.io created
$

Step 2: Deploy Apache Kafka Cluster

$ oc apply -f https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/0.11.1/examples/kafka/kafka-persistent-single.yaml -n myprojectkafka.kafka.strimzi.io/my-cluster created
$$ oc get po
NAME                                          READY   STATUS    RESTARTS   AGE
my-cluster-entity-operator-5bdc9d897f-v5pcc   3/3     Running   0          46s
my-cluster-kafka-0                            2/2     Running   0          117s
my-cluster-zookeeper-0                        2/2     Running   0          2m56s
strimzi-cluster-operator-6954748598-r87gg     1/1     Running   0          11m
$

This will deploy a small persistent Apache Kafka Cluster with one node for each, Apache Zookeeper and Apache Kafka.

The persistent storage for both Apache Kafka and Apache Zookeeper is provisioned using Ceph block storage.

Verify persistent volumes and persistent volume claims, you will notice that PV for both Apache Kafka and Apache Zookeeper clusters were provisioned using Ceph block storage class.

$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                   STORAGECLASS      REASON   AGE
pvc-2580dd3a-520b-11e9-9641-125cefd10aec   100Gi      RWO            Delete           Bound    myproject/data-my-cluster-zookeeper-0   rook-ceph-block            2m58s
pvc-482414fc-520b-11e9-9641-125cefd10aec   100Gi      RWO            Delete           Bound    myproject/data-my-cluster-kafka-0       rook-ceph-block            2m
$$ oc get pvc
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
data-my-cluster-kafka-0       Bound    pvc-482414fc-520b-11e9-9641-125cefd10aec   100Gi      RWO            rook-ceph-block   2m5s
data-my-cluster-zookeeper-0   Bound    pvc-2580dd3a-520b-11e9-9641-125cefd10aec   100Gi      RWO            rook-ceph-block   3m3s
$

Step 3: Deploying Kafka Producer

Once the Kafka cluster is running, you can run a simple producer to send messages to Kafka topic (the topic will be automatically created)

$ oc run kafka-producer -it --image=strimzi/kafka:0.11.1-kafka-2.1.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap:9092 --topic my-topic

(In another shell) verify Kafka producer pod showed up

$ oc get po
NAME                                          READY   STATUS    RESTARTS   AGE
kafka-producer                                1/1     Running   0          20s
my-cluster-entity-operator-5bdc9d897f-v5pcc   3/3     Running   0          2m24s
my-cluster-kafka-0                            2/2     Running   0          3m35s
my-cluster-zookeeper-0                        2/2     Running   0          4m34s
strimzi-cluster-operator-6954748598-r87gg     1/1     Running   0          13m
$

Once kafka-producer pod is initialized, you will notice Kafka producer console prompt, where you can submit any message that will get persistently stored in Kafka topic.

If you don't see a command prompt, try pressing enter.
>
>
>
> Look Maa ! I am Kafka Messages on Ceph Storage
> Maa : Well done my boy !!

Step 3: Deploying Kafka Consumer

To read messages from Kafka topic, a consumer application needs to pull the messages.
Deploy a Kafka-Consumer application and read the messages from the beginning

$ oc run kafka-consumer -it --image=strimzi/kafka:0.11.1-kafka-2.1.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
If you don't see a command prompt, try pressing enter.Look Maa ! I am Kafka Messages on Ceph Storage
Maa : Well done my boy !!

Step 4: Verify Kafka Persistence

Destroy both Kafka-Producer and Kafka-Consumer applications that we created previously ( just Ctrl+c) and verify the pods are gone

$ oc get po
NAME                                          READY   STATUS    RESTARTS   AGE
my-cluster-entity-operator-5bdc9d897f-v5pcc   3/3     Running   0          7m40s
my-cluster-kafka-0                            2/2     Running   0          8m51s
my-cluster-zookeeper-0                        2/2     Running   0          9m50s
strimzi-cluster-operator-6954748598-r87gg     1/1     Running   0          18m
$

The Kafka cluster uses Ceph Block Storage persistent storage to store messages in the topic.
Let’s create another consumer application and subscribe to the topic created by the previous consumer and try to read messages from the beginning.

$ oc run kafka-consumer -it --image=strimzi/kafka:0.11.1-kafka-2.1.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --topic my-topic --from-beginning
If you don't see a command prompt, try pressing enter.Look Maa ! I am Kafka Messages on Ceph Storage
Maa : Well done my boy !!

Cool 😎 , so we have learned to deploy Ceph cluster using Rook operator on OpenShift 4. Then we provisioned Ceph block storage via persistent volume and persistent volume claims using kubernetes storage class. We then deployed Apache Kafka and Apache Zookeeper cluster that used persistent volumes from Ceph block storage. Finally, we created Kafka producer and consumer apps to demonstrate Kafka messages are persistently getting stored on the Ceph cluster.

PS: Strimzi is an open-source project that simplifies running Apache Kafka clusters on OpenShift or Kubernetes clusters and relies on Operator framework. Check out the project documentation here.

“Look Mumma”: Kafka running on OpenShift4 using Ceph Block Storage

What’s Kafka…?

Prerequisite

Part — 1 of 2

Part — 2 of 2

Written by Karan Singh

Responses (2)