Lab: Creating a Strimzi Cluster
Introduction
A basic Kafka cluster consists of two resources in Strimzi:
- A single
kind: Kafka, which is the definition and configuration of a Kafka cluster - A minimum of one
kind: KafkaNodePool, each of which defines a (virtual) pool of pods used to execute Kafka workloads
Both resources are part of a named cluster and together will lead to the Strimzi Operator to create the actual workloads with the specific configuration.
Resource kind: KafkaNodePool
Because Kafka pods are not defined directly, a KafkaNodePool CR is the smallest unit to be defined for Kafka
workloads and represents a virtual pool of worker pods used to run Kafka controllers and/or brokers.
Each pod started by the Strimzi Operator is part of a KafkaNodePool and thus bound to its configuration and limits.
It alone will not cause any resource allocation but will serve as a template for resources created for a specific
cluster.
Specifically, a KafkaNodePool allows to:
- Define the number of replicas
- Set roles (controller, broker or both) which are passed down to each pod
- Define the storage setup used for each pod
- Modify Kubernetes resource configurations via a
podTemplate
As each Kafka Cluster requires brokers as well as controllers, we must ensure to provide at least one Kafka node pool
for each role.
As a KafkaNodePool can be assigned a single role or both (=dual role), we can choose to create separate node pools
for brokers and controllers or use the same pods for both roles.
Example: a basic Kafka node pool with dual-role configuration:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
name: kraft-dual-role
labels:
strimzi.io/cluster: my-cluster
spec:
replicas: 3
roles:
- controller
- broker
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 100Gi
deleteClaim: false
resources:
requests:
memory: 2Gi
cpu: "200m"
limits:
memory: 4Gi
cpu: "500m"
In this example, a node pool is defined with:
- three replicas, which will lead to 3 pods being scheduled
- resource limits of (applied to each pod):
- memory: 2Gi/4Gi (request/limits
- CPU: 200m/500m (request/limits
- Storage of type JBOD with a single persistent volume of 100GiB (see note below)
While other storage configuration types are possible, Strimzi strongly recommends to use a JBOD configuration. Similar to classic storage clusters, JBOD means an extensible array of Kubernetes volumes (but not physical disks) which are mounted into each pod and used to store Kafka log data. As with the resource limits, the volumes defined within the JBOD configuration are applied for each pod.
For example: a configuration specifying two 100Gi volumes within a Kafka node pool of three replicas will lead to three pods being scheduled, each with two unique 100 Gi volumes mounted at their respective data directories.
Resource kind: Kafka
A resource of kind Kafka represent the definition of an abstract Kafka Cluster.
While Kafka node pools are the main entity to configure the workloads of a cluster in terms of Kubernetes resources (
memory, storage, CPU), the cluster resource is the single definition of connectivity, authentication and authorization,
logging and other features which are part of an abstract Kafka cluster.
Common definitions include:
- Listeners, defining endpoints to allow clients to connect with options for authentication and encryption
- Authorization mechanisms to restrict permissions of connected clients
- Logging
- Metrics
- Dedicated Strimzi Operators for a cluster
- Additional features of a Strimzi cluster which are not part of vanilla Kafka deployment (e.g. Cruise Control, jmxExporter)
Because this definition does not include any configuration of workloads, a cluster needs an appropriate number of
KafkaNodePool to schedule workload execution.
Example: a basic Kafka Cluster definition
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
namespace: strimzi
annotations:
strimzi.io/node-pools: enabled
strimzi.io/kraft: enabled
spec:
kafka:
version: 4.0.0
metadataVersion: 4.0-IV3
listeners:
- name: plain
port: 9092
type: internal
tls: false
entityOperator:
topicOperator: { }
userOperator: { }
In the above example we declare a simple Kafka Cluster which requires dedicated Kafka node pools and runs in KRaft mode
instead of Zookeeper.
We also declare a single listener on TCP port 9092 of type internal, which will cause a service of type ClusterIP
to be created.
This service will expose the given TCP port and can be used by Kafka clients within the Kubernetes cluster as the Kafka
bootstrap server.
More information regarding listener configurations can be found in the Strimzi configuration.
Lab Exercise: Creating a Basic Cluster
We would like to declare and deploy a basic Kafka cluster in a dedicated Kubernetes namespace.
Exercise 1: Define a KafkaNodePool
Declare a new Kafka node pool for a simple Kafka cluster with a replication factor of three and using the same nodes both as brokers and controllers. The pods should have an initial resource request of 512 MiB memory and 100m CPU, which should be limited to a maximum of 1 GiB memory and 500m CPU.
Every node should receive a single persistent volume with a capacity of 20 GiB.
The node pool should be part of a cluster named cluster-1 (which does not exist yet).
Start with this resource stub:
apiVersion: kafka.strimzi.io/v1
kind: KafkaNodePool
metadata:
name: cluster-1-dual-role
spec:
Hint 1: Replication Factor and Node Roles
apiVersion: kafka.strimzi.io/v1
kind: KafkaNodePool
metadata:
name: cluster-1-dual-role
labels:
strimzi.io/cluster: cluster-1
spec:
replicas: 3
roles:
- controller
- broker
Hint 2: Resource Limits
apiVersion: kafka.strimzi.io/v1
kind: KafkaNodePool
metadata:
name: cluster-1-dual-role
labels:
strimzi.io/cluster: cluster-1
spec:
replicas: 3
roles:
- controller
- broker
resources:
requests:
memory: 512Mi
cpu: "100m"
limits:
memory: 1Gi
cpu: "500m"
Final Solution
file ./cluster-1.nodepool.yaml
apiVersion: kafka.strimzi.io/v1
kind: KafkaNodePool
metadata:
name: cluster-1-dual-role
labels:
strimzi.io/cluster: cluster-1
spec:
replicas: 3
roles:
- controller
- broker
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 20Gi
deleteClaim: false
resources:
requests:
memory: 512Mi
cpu: "100m"
limits:
memory: 1Gi
cpu: "500m"
Now deploy the node pool in your personal namespace.
Solution
kubectl apply -n NAMESPACE -f ./cluster-1.nodepool.yaml
Exercise 2: Define a cluster (kind: Kafka)
Declare a new Kafka cluster within your personal namespace of version 4.0.0, enabling KRaft and using the name cluster-1. The cluster should use the Kafka node pool created in the previous exercise and utilize Strimzi entity operators for managing topics and users. For connectivity, the cluster should have a single internal, non-encrypted listener accepting connections on TCP port 9093 without requiring authentication.
Start with this stub resource:
apiVersion: kafka.strimzi.io/v1
kind: Kafka
metadata:
spec:
Hint: Base Cluster Properties
file ./cluster-1.yaml
apiVersion: kafka.strimzi.io/v1
kind: Kafka
metadata:
name: cluster-1
annotations:
strimzi.io/node-pools: enabled
strimzi.io/kraft: enabled
spec:
kafka:
version: 4.0.0
metadataVersion: 4.0-IV3
entityOperator:
topicOperator: {}
userOperator: {}
Final Solution
file ./cluster-1.yaml
apiVersion: kafka.strimzi.io/v1
kind: Kafka
metadata:
name: cluster-1
annotations:
strimzi.io/node-pools: enabled
strimzi.io/kraft: enabled
spec:
kafka:
version: 4.0.0
metadataVersion: 4.0-IV3
listeners:
- name: plain
port: 9093
type: internal
tls: false
entityOperator:
topicOperator: {}
userOperator: {}
Now deploy the cluster in your personal namespace.
Solution
kubectl apply -n NAMESPACE -f ./cluster-1.yaml
Exercise 3: Test Access to Your Cluster
Test access to your new cluster using the Debug CLI by executing a simple Kafka admin command.
Please be patient
Please be aware that the official Kafka CLI tools are incredibly slow to start. Therefore, commands can take several seconds to complete.
Replace SERVICE_NAME with the name of your cluster’s bootstrap service and NAMESPACE_NAME with your namespace:
kafka-broker-api-versions --bootstrap-server "SERVICE_NAME.NAMESPACE_NAME.svc:9093" | grep '^cluster.*'
You should see three brokers listed in the output.
Lab Exercise: Creating Properties Files for Your Clients
With increasing complexity in the configuration of your cluster, supplying the correct parameters over the command line
will become cumbersome.
To make (future) configuration easier, we should create a Kubernetes kind: ConfigMap which will be used for storing
Java .properties files we can easily extend and mount into our debug-cli
pod.
Exercise 1: Create a ConfigMap
Create a ConfigMap in your namespace using the name cluster-1-client-cfg and a single key
producer-plaintext-noauth.properties, which should have the following value (replacing MY_NAMESPACE with the name of
your namespace):
bootstrap.servers=cluster-1-kafka-bootstrap.MY_NAMESPACE.svc:9093
sasl.mechanism=PLAIN
security.protocol=PLAINTEXT
parse.key=true
key.separator=:
Solution
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-1-client-cfg
namespace: MY_NAMESPACE
data:
client-plaintext-noauth.properties: |
bootstrap.servers=cluster-1-kafka-bootstrap.MY_NAMESPACE.svc:9093
sasl.mechanism=PLAIN
security.protocol=PLAINTEXT
parse.key=true
key.separator=:
Exercise 2: Mounting the ConfigMap
Using your existing deployment for the debug-cli, mount the previously created ConfigMap into the pod at the path
/config/cluster-1. Reuse the previous deployment specification.
Hint 1: Referencing the ConfigMap
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-debug-cli
#...
spec:
#...
template:
#...
spec:
containers:
#...
volumes:
- name: cluster-1-config
configMap:
name: cluster-1-client-cfg
Hint 2: Mounting the ConfigMap
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-debug-cli
#...
spec:
#...
template:
#...
spec:
containers:
- name: debug-cli
#...
volumeMounts:
- name: cluster-1-config
mountPath: /config/cluster-1
volumes:
- name: cluster-1-config
configMap:
name: cluster-1-client-cfg
Hint 3: Solution
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-1-client-cfg
data:
client-plaintext-noauth.properties: |
bootstrap.servers=cluster-1-kafka-bootstrap.{YOUR_NAMESPACE}.svc:9093
sasl.mechanism=PLAIN
security.protocol=PLAINTEXT
parse.key=true
key.separator=:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: debug-cli-user-conf
spec:
storageClassName: default
resources:
requests:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-debug-cli
labels:
app: debug-cli
spec:
replicas: 1
selector:
matchLabels:
app: debug-cli
template:
metadata:
labels:
app: debug-cli
spec:
initContainers:
- name: debug-cli-chown
image: krassestecontainerreistry.azurecr.io/kafka-oauth-client:latest
securityContext:
privileged: true
runAsUser: 0
runAsGroup: 0
command: [ "chown", "-R", "user:user", "/opt/user_conf" ]
volumeMounts:
- name: user-conf
mountPath: /opt/user_conf
readOnly: false
containers:
- name: debug-cli
image: krassestecontainerreistry.azurecr.io/kafka-oauth-client:latest #TODO: replace ACR name
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 200m
memory: 200Mi
volumeMounts:
- name: user-conf
mountPath: /opt/user_conf
readOnly: false
- name: cluster-1-config
mountPath: /config/cluster-1
volumes:
- name: user-conf
persistentVolumeClaim:
claimName: debug-cli-user-conf
- name: cluster-1-config
configMap:
name: cluster-1-client-cfg