The ranty programmer

Next : The Front-End DLC (The DevOps Journey - 1.1)

Prologue

Well apparently you are not a real back-end engineer until you've added Kubernetes to your CV. So let's fix that, shall we? During the next couple months I will be taking on the very simple task of deploying a production grade Kubernetes cluster. What can I say? I'm not a very ambitious man.

A couple months? An AI can do it in an afternoon.

Yeah, I don't care. Programming is not a means to an end but an end in itself. Plus this is a learning series so time to market is not really a priority here and using an agent kind of defeats the purpose.

The application

To be honest what we are deploying is not really important so I'll just gloss over it. I already spend my days modeling complex business domains. This project is about everything else around it so I don't want to be dealing with a lot of business rules in here. All the complexity must be reside in our performance budget. So after a lot of back and forth with every chatbot out there, I've landed on modeling an RTB Ad-Exchange. Yeah, boring that's the point. But you do need to process a bid and get it back to the browser in less than 80ms so this should be the perfect problem to model high performance infra.

Preparing the dev environment

My Specs

This project involves juggling a bunch of virtual machines(VMs) most of the time so you will need a fairly powerful computer to get a decent feedback loop. Also I will be building system images many times throughout the course of the series so #SSDlivesmatter might come after me. It's important when building images to try to build them in a tmpfs and fit them in RAM. These are the specs of the laptop I will be working with when developing.

CPU: 12th Gen Intel i9-12900H
GPU: RTX 3070 Ti Laptop GPU
RAM: 32GB
SSD: Samsung SSD 990 PRO 2TB

For running the actual simulation I will be renting some auctioned bare metal servers. More on that on coming chapters.

Virtualization

Obviously a cluster implies multiple nodes and a real production cluster is constantly breathing in and out nodes as the demand requires it. Therefore I need to be able to provision nodes dynamically and this is where QEMU/KVM and Libvirt come to the rescue.

There are a few problems with this setup though. Well, more like one big problem really: context switching. Sadly I only have 20 logical cores in my poor CPU and only 12 of them are P(performance) cores. So we need to keep in mind that in order to get reliable and measurable results I gotta pin my vcpus to some of these cores and isolate them from the rest of the OS and the hardware interrupts¹. I will not be doing it for now cause well I do want to keep using my computer; but it is an important thing to take into account when deploying into the actual test server.

IaC

Obviously we're not going to be manually defining and creating resources everytime we need to shape the infra. This is where IaC enters the picture.

Everyone knows Terraform and its open-source fork Opentofu, really nice to read and such but it is kinda like bash; once you need a loop you're better off using something else.

Enter the picture Pulumi. While Terraform is all declarative and anyone can read it, Pulumi is imperative in its definition though declarative in its execution and is geared toward taking infrastructure as code to its logical conclusion. You write normal code, you test it like normal code.

A big pro of Pulumi is that it can actually wrap most terraform providers which means you don't miss on the vast Terraform ecosystem that's industry standard. Getting it to work outside of their cloud is kind of a pain in the ass though.

After trying to do the provisioning using Terraform it didn't take very long until the templates ended up becoming a for_each soup. Given I'm going to be mutating the infra in real time to simulate a cloud environment Pulumi was the more logical choice in spite of its quirks.

How to create a cluster

Using the Libvirt Terraform provider is fairly straightforward. It now maps directly to the XML Spec² . When importing the package into Pulumi there was a translation error in one of the generated mappings that broke the compilation and I had to fix the code manually. Luckily it was a one-line-change of a return value so no biggie

For getting the most basic cluster running we only need 2 nodes. I used the Debian 13 cloud image as the base image for the VMs. In order to save space I created a base volume holding the cloud image and used it as backing store for each of the VMs' disks. This functionality allows for only changes to be stored in the overlays reducing the disk footprint of the system specially as the node count grows. It's important to note though, that performance degradation of overlaid filesystems gets pretty extreme as the size of the overlay grows. Using raw images per node will probably be the first optimization step if/when disk proves to be a bottleneck³.

I created as well cloud-init images for each node and a network interface. It's important to create a dedicated bridge with NAT for these images so that latency and jitter tests do not interfere with other VMs in the system and the network space is not polluted with foreign nodes.

This is how the initial code for deploying a domain(for some reason this is what VMs are called in Libvirt) looks like. I'm using mostly Go throughout this project so I figured I'll use it for Pulumi too. Though, to be fair, Python or Typescript should get you a more tf-like experience.

type NodeType string

const (
	NodeTypeControl NodeType = "control"
	NodeTypeWorker  NodeType = "worker"
)

type Node struct {
	Type   NodeType
	Memory uint16
	Vcpu   uint8
}

func buildDomain(
	ctx *pulumi.Context,
	node Node,
	nodeIdx int,
	internalMac string,
	mainDisk *libvirt.Volume,
	cloudInitVolume *libvirt.Volume,
	networkInterface *libvirt.Network,
) (*libvirt.Domain, error) {
	domain, err := libvirt.NewDomain(ctx, fmt.Sprintf("linux_%s_%d", node.Type, nodeIdx), &libvirt.DomainArgs{
		Name:       pulumi.String(fmt.Sprintf("k3s-%s-%d", node.Type, nodeIdx)),
		Memory:     pulumi.Float64(node.Memory),
		MemoryUnit: pulumi.String("MiB"),
		Vcpu:       pulumi.Float64(node.Vcpu),
		Type:       pulumi.String("kvm"),
		Os: libvirt.DomainOsArgs{
			Type:        pulumi.String("hvm"),
			TypeArch:    pulumi.String("x86_64"),
			TypeMachine: pulumi.String("pc"),
			BootDevices: libvirt.DomainOsBootDeviceArray{
				libvirt.DomainOsBootDeviceArgs{
					Dev: pulumi.String("hd"),
				},
				libvirt.DomainOsBootDeviceArgs{
					Dev: pulumi.String("network"),
				},
			},
		},
		Cpu: libvirt.DomainCpuArgs{
			Mode:  pulumi.String("host-passthrough"),
			Check: pulumi.String("none"),
		},
		Devices: libvirt.DomainDevicesArgs{
			Disks: libvirt.DomainDevicesDiskArray{
				libvirt.DomainDevicesDiskArgs{
					Target: libvirt.DomainDevicesDiskTargetArgs{
						Dev: pulumi.String("vda"),
						Bus: pulumi.String("virtio"),
					},
					Source: libvirt.DomainDevicesDiskSourceArgs{
						Volume: libvirt.DomainDevicesDiskSourceVolumeArgs{
							Pool:   mainDisk.Pool,
							Volume: mainDisk.Name,
						},
					},
					Driver: libvirt.DomainDevicesDiskDriverArgs{
						Name: pulumi.String("qemu"),
						Type: pulumi.String("qcow2"),
					},
				},
				libvirt.DomainDevicesDiskArgs{
					Target: libvirt.DomainDevicesDiskTargetArgs{
						Dev: pulumi.String("sdb"),
						Bus: pulumi.String("sata"),
					},
					Source: libvirt.DomainDevicesDiskSourceArgs{
						Volume: libvirt.DomainDevicesDiskSourceVolumeArgs{
							Pool:   cloudInitVolume.Pool,
							Volume: cloudInitVolume.Name,
						},
					},
					Driver: libvirt.DomainDevicesDiskDriverArgs{
						Name: pulumi.String("qemu"),
						Type: pulumi.String("raw"),
					},
				},
			},
			Interfaces: libvirt.DomainDevicesInterfaceArray{
				libvirt.DomainDevicesInterfaceArgs{
					Model: libvirt.DomainDevicesInterfaceModelArgs{
						Type: pulumi.String("virtio"),
					},
					Mac: libvirt.DomainDevicesInterfaceMacArgs{
						Address: pulumi.String(
							internalMac,
						),
					},
					Source: libvirt.DomainDevicesInterfaceSourceArgs{
						Network: libvirt.DomainDevicesInterfaceSourceNetworkArgs{
							Network: networkInterface.Name,
						},
					},
				},
			},
			Serials: libvirt.DomainDevicesSerialArray{
				libvirt.DomainDevicesSerialArgs{
					Target: libvirt.DomainDevicesSerialTargetArgs{
						Port: pulumi.Float64(0),
					},
				},
			},
			Consoles: libvirt.DomainDevicesConsoleArray{
				libvirt.DomainDevicesConsoleArgs{
					Target: libvirt.DomainDevicesConsoleTargetArgs{
						Type: pulumi.String("serial"),
						Port: pulumi.Float64(0),
					},
				},
			},
		},
	})

	return domain, err
}

Conclusion

This is about it for now. I have setup provisioning of a dynamic number of linux VMs with a minimal amount of code. Now all that's separating me from a working system is a simple pulumi up. Next step will be automating the bootstrapping of the Kubernetes cluster.

References

CPU Isolation – A practical example – by SUSE Labs (part 5)

Libvirt XML Spec

QCOW2 performance degradation