Skip to content

slipperypenguin/igloo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

igloo

igloo

Kubernetes based home network 🐧




k8s

Overview

This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Kubernetes, Flux, Renovate, and GitHub Actions.


πŸ’»Β  Kubernetes

My Kubernetes cluster is deployed with Talos. This is a semi-hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate NAS server for NFS/SMB shares, bulk file storage and backups.

Core Components

  • actions-runner-controller: Self-hosted Github runners.
  • cert-manager: Creates SSL certificates for services in my cluster.
  • cilium: Internal Kubernetes container networking interface.
  • cloudflared: Enables Cloudflare secure access to certain ingresses.
  • external-dns: Automatically syncs ingress DNS records to a DNS provider.
  • external-secrets: Managed Kubernetes secrets using 1Password Connect.
  • openebs: local storage provisioner
    • rook: Distributed block storage for persistent storage.
  • sops: Managed secrets for Kubernetes and Terraform which are committed to Git.
  • spegel: Stateless cluster local OCI registry mirror.
  • volsync: Backup and recovery of persistent volume claims.

GitOps

Flux watches the clusters in my kubernetes folder (see Directories below) and makes the changes to my clusters based on the state of my Git repository.

The way Flux works for me here is it will recursively search the kubernetes/apps folder until it finds the most top level kustomization.yaml per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml will generally only have a namespace resource and one or many Flux kustomizations (ks.yaml). Under the control of those Flux kustomizations there will be a HelmRelease or other resources related to the application which will be applied.

Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.

Repository structure

πŸ“ .github         # GH Actions configs, repo reference objects, renovate config
πŸ“ kubernetes      # Kubernetes cluster defined as code
β”œβ”€πŸ“ apps          # Applications deployed into the cluster grouped by namespace
β”œβ”€πŸ“ components    # Re-useable Kustomize components
β””β”€πŸ“ flux          # Flux system configuration

Flux Workflow

This is a high-level look how Flux deploys my applications with dependencies. In most cases a HelmRelease will depend on other HelmRelease's, in other cases a Kustomization will depend on other Kustomization's, and in rare situations an app can depend on a HelmRelease and a Kustomization. The example below shows that gatus won't be deployed or upgrade until the rook-ceph-cluster Helm release is installed or in a healthy state.

graph TD
    A>Kustomization: rook-ceph] -->|Creates| B[HelmRelease: rook-ceph]
    A>Kustomization: rook-ceph] -->|Creates| C[HelmRelease: rook-ceph-cluster]
    C>HelmRelease: rook-ceph-cluster] -->|Depends on| B>HelmRelease: rook-ceph]
    D>Kustomization: gatus] -->|Creates| E(HelmRelease: gatus)
    E>HelmRelease: gatus] -->|Depends on| C>HelmRelease: rook-ceph-cluster]
Loading

🌐  Networking

The Kube gateway api is utilized through cilium to manage routes. This cluster uses two instances of ExternalDNS running. One for syncing private DNS records to my UDM Pro using ExternalDNS webhook provider for UniFi, while another instance syncs public DNS to Cloudflare. This setup is managed by creating ingresses with two specific classes: internal for private DNS and external for public DNS. The external-dns instances then syncs the DNS records to their respective platforms accordingly.

βš™Β  Hardware

Device Count OS Disk Size Data Disk Size Ram Purpose Alias OS
Asus NUC 14 Pro 1 512GB NVMe SSD 2TB SATA SSD 64GB Kubernetes Control-Plane asus-node-01 Talos Linux
Dell Optiplex 7040 1 256GB NVMe SSD 1TB SATA SSD 16GB Kubernetes Worker dell-node-01 Talos Linux
Dell Optiplex 7060 1 512GB NVMe SSD 1TB SATA SSD 32GB Kubernetes Control-Plane dell-node-02 Talos Linux
Helios64 NAS 1 N/A 8x4TB RAID6 4GB Media and shared file storage glacier Debian GNU/Linux
MacBook Pro 2012 1 250GB SSD N/A 8GB Kubernetes Control-Plane mbp-node-01 Talos Linux
MacBook Pro 2016 1 500GB SSD N/A 16GB Kubernetes Worker mbp-node-02 Talos Linux

Software

πŸ”§Β  Tools

Tool Purpose
mise Set KUBECONFIG environment variable based on present working directory
sops Encrypt secrets
go-task Replacement for make and makefiles
talos Operating System to install on nodes
uv python package + virtualenv manager

πŸ›ŽΒ  Cloud Services

While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about three things. (1) Dealing with chicken/egg scenarios, (2) services I critically need whether my cluster is online or not and (3) The "hit by a bus factor" - what happens to critical apps (e.g. Email, Password Manager, Photos) that my family relies on when I no longer around.

Service Use Cost
1Password Secrets with External Secrets ~$65/yr
Cloudflare Domain and S3 ~$30/yr
GitHub Hosting this repository and continuous integration/deployments Free
Pushover Kubernetes Alerts and application notifications $5 OTP
Tailscale Device VPN Free
Total: ~$8/mo

Media Stack

The servarr stack supports torrent and Usenet-based automation and is integrated for high performance, privacy, and seed ratio maximization:

Cluster Notes

🌱 Environment

mise will make it so anytime you cd to your repo's directory it will export the required environment variables (e.g. KUBECONFIG). To set this up:

  • Install and activate mise
  • Use mise to install the required CLI tools:
    mise trust && mise install && mise run deps
    

πŸ› οΈ Talos and Kubernetes Maintenance

βš™οΈ Updating Talos node configuration

Tip

Ensure you have updated talconfig.yaml and any patches with your updated configuration. In some cases you not only need to apply the configuration but also upgrade talos to apply new configuration.

# (Re)generate the Talos config
task talos:generate-config
# Apply the config to the node
task talos:apply-node IP=? MODE=?
# e.g. task talos:apply-node IP=10.10.10.10 MODE=auto

⬆️ Updating Talos and Kubernetes versions

Tip

Ensure the talosVersion and kubernetesVersion in talconfig.yaml are up-to-date with the version you wish to upgrade to.

# Upgrade node to a newer Talos version
task talos:upgrade-node IP=?
# e.g. task talos:upgrade-node IP=10.10.10.10
# Upgrade cluster to a newer Kubernetes version
task talos:upgrade-k8s
# e.g. task talos:upgrade-k8s

πŸ› Debugging

Below is a general guide on trying to debug an issue with an resource or application. For example, if a workload/resource is not showing up or a pod has started but in a CrashLoopBackOff or Pending state.

  1. Start by checking all Flux Kustomizations & Git Repository & OCI Repository and verify they are up-to-date and in a ready state.

    • flux get sources oci -A
    • flux get sources git -A
    • flux get ks -A
    • flux get all -A
  2. Force Flux to sync your repository to your cluster:

    flux -n flux-system reconcile ks flux-system --with-source
  3. Verify all the Flux Helm Releases are up-to-date and in a ready state.

    • flux get hr -A
  4. Then check the if the pod is present.

    • kubectl -n <namespace> get pods -o wide
  5. Then check the logs of the pod if its there.

    • kubectl -n <namespace> logs <pod-name> -f

Note: If a resource exists, running kubectl -n <namespace> describe <resource> <name> might give you insight into what the problem(s) could be.

🀝 Thanks

Huge shout out to @onedr0p and the k8s@Home community!

References

About

🐧 k8s based home network

Topics

Resources

License

Stars

Watchers

Forks

Contributors 13

Languages