Replacing Flannel with Cilium as Default CNI on Cluster API

In a previous post, I wrote about Making Kubernetes Audit Logs Enabled in Cluster API—a feature that is essential for security monitoring, especially while learning audit log patterns I’ve been experimenting with for the past couple of months.

Here I continue with the networking setup Incus Cluster API generates by default: Flannel is what you get when you set the DEPLOY_KUBE_FLANNEL environment variable while generating the cluster manifest.

My choice is still Cilium, as always, so I can play around with it for various scenarios. The question is how to deploy Cilium when the provider doesn’t bundle it. Read the cluster manifest and you’ll see a ConfigMap that stores a cni.yml manifest describing the resources needed for Flannel. Replace that value with the Cilium manifests you need.

To generate a cluster manifest, use:

clusterctl generate cluster <cluster_name> -i incus --kubernetes-version v1.3x.0 --control-plane-machine-count 1 --worker-machine-count 1 > cluster.yaml

There are a few ways to install Cilium, Cilium CLI or Helm. For manifests we can drop into cni.yml, Helm is enough, use helm template to render everything we need, then paste into the ConfigMap value.

helm repo add cilium https://helm.cilium.io/
helm template cilium cilium/cilium \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set ipam.mode=kubernetes \
  --set routingMode=tunnel \
  --set tunnelProtocol=vxlan \
  --set operator.prometheus.enabled=false \
  > cilium.yaml

Once we have those manifests, replace the ConfigMap value and swap every kube-flannel string for cilium so names stay consistent. Cilium is not generated by the Cluster API template, so you replace those strings yourself.

Apply the cluster manifest to start provisioning. I’m facing cilium-operator pods stuck in Pending, the scheduler message looked like this:

  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  37s (x3 over 2m21s)  default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. no new claims to deallocate, preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

At first I thought something about pod ports was preventing Cilium from starting. It turned out that cilium-operator defaults to two replicas, both exposing host port 9234/TCP, so on a single-node cluster (control plane only) the second replica cannot bind the same port on that host and scheduling fails. Adding a worker node fixed it, the extra replica scheduled onto another machine and both operators came up healthy.

I used the following command to add a worker node to my existing cluster:

kubectl patch cluster <cluster_name> \
  --type=merge \
  -p '{"spec":{"topology":{"workers":{"machineDeployments":[{"name":"md-0","class":"default-worker","replicas":1}]}}}}'

Check status with:

kubectl -n kube-system exec ds/cilium -- cilium status

Example output:

KVStore:                 Disabled
Kubernetes:              Ok         1.35 (v1.35.0) [linux/amd64]
Kubernetes APIs:         ["cilium/v2::CiliumCIDRGroup", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Pods", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    True   [eth0   10.186.119.246 fe80::1266:6aff:fef0:ed0d (Direct Routing)]
Host firewall:           Disabled
SRv6:                    Disabled
CNI Chaining:            none
CNI Config file:         successfully wrote CNI configuration file to /host/etc/cni/net.d/05-cilium.conflist
Cilium:                  Ok   1.19.3 (v1.19.3-f5eb641b)
NodeMonitor:              Listening for events on 16 CPUs with 64x4096 of shared memory
Cilium health daemon:    Ok
IPAM:                    IPv4: 4/254 allocated from 10.244.0.0/24,
IPv4 BIG TCP:            Disabled
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Routing:                 Network: Tunnel [vxlan]   Host: Legacy
Attach Mode:             TCX
Device Mode:             veth
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       23/23 healthy
Proxy Status:            OK, ip 10.244.0.50, 0 redirects active on ports 10000-20000, Envoy: external
Global Identity Range:   min 256, max 65535
Hubble:                  Ok              Current/Max Flows: 4095/4095 (100.00%), Flows/s: 3.97   Metrics: Disabled
Encryption:              Disabled
Cluster health:          2/2 reachable   (2026-05-09T13:13:59Z)   (Probe interval: 1m36.566274746s)
Name                     IP              Node                     Endpoints
Modules Health:          Stopped(23) Degraded(0) OK(79)

Finally, we can use our lab with Cilium installed whenever we provision the cluster through Cluster API.

Replacing Flannel with Cilium as Default CNI on Cluster API

Tags