This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Kubernetes, Flux, Renovate, and GitHub Actions.
My Kubernetes cluster is deployed with Talos. This is a semi-hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate NAS server for NFS/SMB shares, bulk file storage and backups.
- actions-runner-controller: Self-hosted Github runners.
- cert-manager: Creates SSL certificates for services in my cluster.
- cilium: Internal Kubernetes container networking interface.
- cloudflared: Enables Cloudflare secure access to certain ingresses.
- external-dns: Automatically syncs ingress DNS records to a DNS provider.
- external-secrets: Managed Kubernetes secrets using 1Password Connect.
- openebs: local storage provisioner
- rook: Distributed block storage for persistent storage.
- sops: Managed secrets for Kubernetes and Terraform which are committed to Git.
- spegel: Stateless cluster local OCI registry mirror.
- volsync: Backup and recovery of persistent volume claims.
Flux watches the clusters in my kubernetes folder (see Directories below) and makes the changes to my clusters based on the state of my Git repository.
The way Flux works for me here is it will recursively search the kubernetes/apps folder until it finds the most top level kustomization.yaml per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml will generally only have a namespace resource and one or many Flux kustomizations (ks.yaml). Under the control of those Flux kustomizations there will be a HelmRelease or other resources related to the application which will be applied.
Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.
π .github # GH Actions configs, repo reference objects, renovate config
π kubernetes # Kubernetes cluster defined as code
ββπ apps # Applications deployed into the cluster grouped by namespace
ββπ components # Re-useable Kustomize components
ββπ flux # Flux system configurationThis is a high-level look how Flux deploys my applications with dependencies. In most cases a HelmRelease will depend on other HelmRelease's, in other cases a Kustomization will depend on other Kustomization's, and in rare situations an app can depend on a HelmRelease and a Kustomization. The example below shows that gatus won't be deployed or upgrade until the rook-ceph-cluster Helm release is installed or in a healthy state.
graph TD
A>Kustomization: rook-ceph] -->|Creates| B[HelmRelease: rook-ceph]
A>Kustomization: rook-ceph] -->|Creates| C[HelmRelease: rook-ceph-cluster]
C>HelmRelease: rook-ceph-cluster] -->|Depends on| B>HelmRelease: rook-ceph]
D>Kustomization: gatus] -->|Creates| E(HelmRelease: gatus)
E>HelmRelease: gatus] -->|Depends on| C>HelmRelease: rook-ceph-cluster]
The Kube gateway api is utilized through cilium to manage routes.
This cluster uses two instances of ExternalDNS running. One for syncing private DNS records to my UDM Pro using ExternalDNS webhook provider for UniFi, while another instance syncs public DNS to Cloudflare. This setup is managed by creating ingresses with two specific classes: internal for private DNS and external for public DNS. The external-dns instances then syncs the DNS records to their respective platforms accordingly.
| Device | Count | OS Disk Size | Data Disk Size | Ram | Purpose | Alias | OS |
|---|---|---|---|---|---|---|---|
| Asus NUC 14 Pro | 1 | 512GB NVMe SSD | 2TB SATA SSD | 64GB | Kubernetes Control-Plane | asus-node-01 | Talos Linux |
| Dell Optiplex 7040 | 1 | 256GB NVMe SSD | 1TB SATA SSD | 16GB | Kubernetes Worker | dell-node-01 | Talos Linux |
| Dell Optiplex 7060 | 1 | 512GB NVMe SSD | 1TB SATA SSD | 32GB | Kubernetes Control-Plane | dell-node-02 | Talos Linux |
| Helios64 NAS | 1 | N/A | 8x4TB RAID6 | 4GB | Media and shared file storage | glacier | Debian GNU/Linux |
| MacBook Pro 2012 | 1 | 250GB SSD | N/A | 8GB | Kubernetes Control-Plane | mbp-node-01 | Talos Linux |
| MacBook Pro 2016 | 1 | 500GB SSD | N/A | 16GB | Kubernetes Worker | mbp-node-02 | Talos Linux |
| Tool | Purpose |
|---|---|
| mise | Set KUBECONFIG environment variable based on present working directory |
| sops | Encrypt secrets |
| go-task | Replacement for make and makefiles |
| talos | Operating System to install on nodes |
| uv | python package + virtualenv manager |
While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about three things. (1) Dealing with chicken/egg scenarios, (2) services I critically need whether my cluster is online or not and (3) The "hit by a bus factor" - what happens to critical apps (e.g. Email, Password Manager, Photos) that my family relies on when I no longer around.
| Service | Use | Cost |
|---|---|---|
| 1Password | Secrets with External Secrets | ~$65/yr |
| Cloudflare | Domain and S3 | ~$30/yr |
| GitHub | Hosting this repository and continuous integration/deployments | Free |
| Pushover | Kubernetes Alerts and application notifications | $5 OTP |
| Tailscale | Device VPN | Free |
| Total: ~$8/mo |
The servarr stack supports torrent and Usenet-based automation and is integrated for high performance, privacy, and seed ratio maximization:
- Indexers:
- Downloaders:
- qBittorrent (via Gluetun + ProtonVPN provider)
- sabnzbd (for Usenet)
- Organizers:
- Automation:
- Cross-seed β uses hardlink watch and injects back into qBittorrent to boost sharing ratios
- Autobrr β filters and pushes releases to qBittorrent and/or Radarr via custom webhook integration
- Frontends:
- Jellyfin β main media frontend
- Jellyseerr β request management for Jellyfin users
mise will make it so anytime you cd to your repo's directory it will export the required environment variables (e.g. KUBECONFIG). To set this up:
- Install and activate mise
- Use
miseto install the required CLI tools:mise trust && mise install && mise run deps
Tip
Ensure you have updated talconfig.yaml and any patches with your updated configuration. In some cases you not only need to apply the configuration but also upgrade talos to apply new configuration.
# (Re)generate the Talos config
task talos:generate-config
# Apply the config to the node
task talos:apply-node IP=? MODE=?
# e.g. task talos:apply-node IP=10.10.10.10 MODE=autoTip
Ensure the talosVersion and kubernetesVersion in talconfig.yaml are up-to-date with the version you wish to upgrade to.
# Upgrade node to a newer Talos version
task talos:upgrade-node IP=?
# e.g. task talos:upgrade-node IP=10.10.10.10# Upgrade cluster to a newer Kubernetes version
task talos:upgrade-k8s
# e.g. task talos:upgrade-k8sBelow is a general guide on trying to debug an issue with an resource or application. For example, if a workload/resource is not showing up or a pod has started but in a CrashLoopBackOff or Pending state.
-
Start by checking all Flux Kustomizations & Git Repository & OCI Repository and verify they are up-to-date and in a ready state.
flux get sources oci -Aflux get sources git -Aflux get ks -Aflux get all -A
-
Force Flux to sync your repository to your cluster:
flux -n flux-system reconcile ks flux-system --with-source
-
Verify all the Flux Helm Releases are up-to-date and in a ready state.
flux get hr -A
-
Then check the if the pod is present.
kubectl -n <namespace> get pods -o wide
-
Then check the logs of the pod if its there.
kubectl -n <namespace> logs <pod-name> -f
Note: If a resource exists, running kubectl -n <namespace> describe <resource> <name> might give you insight into what the problem(s) could be.
Huge shout out to @onedr0p and the k8s@Home community!