Persistent data

Original cluster external storage used an nfs server, which represents a single point of failure, and is replaced with a Ceph cluster. In addition, there are unsolved issues with postgres and Opensearch using NFS as a storage class, specifically failing on chown operations.

We use the Canonical MicroSeph distribution of Ceph, and build a cluster using the hosts sigiriya, james, and bukit.

Before starting, check that the ubuntu installations have allocate sufficient space for the root filesystem. I found that only 100G of a 2TB drive was assigned to the root logical volume, which caused ceph cluster failure because the OS disk was above 80%.

Preparation

To resize root partition to use all of the available space:

To display volume information: `sudo vgdisplay``
To extend the logical volume for the root partition: lvextend -l +100%FREE /dev/mapper/ubuntu--vg-ubuntu--lv
To resize the root partition: resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv

Installing microceph

On all hosts in the cluster, install MicroCeph: sudo snap install microceph
On Sigiriya (the master), bootstrap: sudo microceph cluster bootstrap
To prep to add james: sudo microceph cluster add james
This provides a key to be used on James, where we run: sudo microceph cluster join <token>
Similarly, to add bukit, on Sigiriya, run: - sudo microceph cluster add bukit
On Bukit sudo microceph cluster join <token>
Next create (where using loopback volumes) and add disks
To create a loopback disk:

  #!/bin/bash
  loop_file="$(sudo mktemp -p /mnt XXXX.img)"
  sudo truncate -s 60G "${loop_file}"
  loop_dev="$(sudo losetup --show -f "${loop_file}")"
  minor="${loop_dev##/dev/loop}"
  sudo mknod -m 0660 "/dev/sdib" b 7 "${minor}"

Add a disk to each node in the cluster: sudo microceph disk add --wipe "/dev/sdXX"

Additional basic setups, for test cluster:

sudo microceph.ceph config set global osd_pool_default_size 2
sudo microceph.ceph config set mgr mgr_standby_modules false
sudo microceph.ceph config set osd osd_crush_chooseleaf_type 0

Installing the ceph dashboard

To expose the dashboard over https, use the prod k8s Certificate Manager to get a certificate from LetsEncrypt, and create secret ceph-cert
To extract it from k8s:
kubectl get secret ceph-cert -o json | jq -r '.data.["tls.key"]' | base64 -d >ceph.key
kubectl get secret ceph-cert -o json | jq -r '.data.["tls.crt"]' | base64 -d >ceph.crt
To enable ceph dashboard: sudo ceph mgr module enable dashboard
sudo ceph dashboard set-ssl-certificate -i - < /home/colleymj/ceph/ceph.crt
sudo ceph dashboard set-ssl-certificate-key -i - < /home/colleymj/ceph/ceph.key
Add a user: echo <> |sudo ceph dashboard ac-user-create colleymj -i - administrator
Get endpoint dashboard ULR: sudo ceph mgr services
If there is an issue with the standard port: sudo ceph config set mgr mgr/dashboard/ssl_server_port 8081
Or the host: ceph config set mgr mgr/dashboard/server_addr 192.168.0.6

Connecting microk8s to the ceph cluster

NOTE: A single node k8s cluster required 4GB RAM and 4 CPUs, else did not complete installation
NOTE: "Before enabling the rook-ceph addon on a strictly confined MicroK8s, make sure the rbd kernel module is loaded with sudo modprobe rbd." :
Configure microk8s cluster:sudo microk8s enable rook-ceph
We can now connect the k8s cluster to external storage
We need ceph.conf, and ceph.keyring to attach, which can be found in the microceph snap directory: /var/snap/microceph/current/conf
Using config and key: sudo microk8s connect-external-ceph --ceph-conf ceph.conf --keyring ceph.keyring --rbd-pool southcluster_rbd

Some ceph CLI commands

sudo microceph status
sudo microceph cluster config get cluster_network
sudo microceph cluster config list
sudo microceph cluster list
sudo microceph disk list
sudo microceph.ceph status