Virtualization
[!NOTE] This module explores the core principles of Virtualization, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Machine in the Machine
Virtualization is the art of deceiving an Operating System into believing it owns the hardware, when in reality, it is just a guest in a hotel managed by the Hypervisor (Virtual Machine Monitor - VMM).
It is the foundation of the entire Cloud (AWS EC2, Google Compute Engine). Without efficient virtualization, the modern internet would not exist.
The Problem: Privilege
An OS kernel is designed to run in Ring 0 (Kernel Mode) with full control over the CPU.
- It expects to manipulate the Page Tables (CR3 register).
- It expects to handle Interrupts.
- It expects to execute privileged instructions (
HLT,LGDT).
If you run multiple OSs on one CPU, they can’t all be in Ring 0. If Guest A disables interrupts, it would freeze Guest B and the Host.
2. Types of Hypervisors
Type 1: Bare Metal (Native)
The Hypervisor is the Operating System. It boots directly on the hardware.
- Examples: VMware ESXi, Xen, Microsoft Hyper-V.
- Performance: High. The VMM has direct access to hardware drivers.
- Use Case: Enterprise Data Centers, Cloud Providers.
Type 2: Hosted
The Hypervisor runs as a software application on top of a standard Host OS.
- Examples: VirtualBox, VMware Workstation, QEMU.
- Performance: Lower. I/O requests must pass through the Guest OS → VMM → Host OS → Hardware.
- Use Case: Developers testing code, running Linux on Windows.
[!NOTE] KVM (Kernel-based Virtual Machine) is a hybrid. It turns the Linux Kernel into a Type 1 Hypervisor using a kernel module, allowing it to spawn VMs as regular processes.
3. Hardware Assist (Intel VT-x / AMD-V)
In the old days (pre-2005), virtualization was done via Binary Translation (rewriting Guest OS code on the fly to replace privileged instructions). This was slow and complex.
Modern CPUs solve this with Hardware Assist.
- VMX Root Mode: Where the Hypervisor runs (Ring 0).
- VMX Non-Root Mode: Where the Guest OS runs.
Crucially, the Guest OS thinks it is in Ring 0, but it is actually in a constrained “Guest Ring 0”.
The VM Exit
When a Guest tries to do something dangerous (like changing the Page Table or accessing hardware), the CPU triggers a VM Exit.
- CPU pauses the Guest (Non-Root Mode).
- Context switches to the Hypervisor (Root Mode).
- Hypervisor handles the request (emulates the hardware).
- Hypervisor executes
VMRESUMEto switch back to the Guest (VM Entry).
4. Interactive: Ring Transition Simulator
Visualize the transition between Guest Mode and Host Mode (Root).
5. Memory Virtualization (EPT / SLAT)
The Guest OS thinks it manages physical memory using its own Page Tables.
- Guest Virtual Address (GVA) → Guest Physical Address (GPA).
But the Hardware uses Host Physical Addresses (HPA). Before hardware support, the Hypervisor had to maintain Shadow Page Tables (mapping GVA → HPA directly) in software. Every time the Guest changed its page table, the Hypervisor had to trap and update the shadow table. This was incredibly expensive.
Solution: Extended Page Tables (EPT) / SLAT The CPU hardware walks two layers of page tables:
- Guest CR3: GVA → GPA.
- Host EPT Pointer: GPA → HPA. This eliminates the need for Shadow Page Tables and VM Exits on page faults.
6. Code Example: The Hypervisor Loop
How does a Hypervisor actually work in code? It’s essentially an infinite loop that runs the CPU until it exits.
package main
import (
"fmt"
"syscall"
"unsafe"
)
// Conceptual KVM interaction in Go
// Real KVM requires complex ioctl handling with Cgo or pure Go syscalls
func main() {
// 1. Open KVM device
kvm, _ := syscall.Open("/dev/kvm", syscall.O_RDWR, 0)
// 2. Create a Virtual Machine
vmFd, _, _ := syscall.Syscall(syscall.SYS_IOCTL, uintptr(kvm),
KVM_CREATE_VM, 0)
// 3. Create a VCPU (Virtual CPU)
vcpuFd, _, _ := syscall.Syscall(syscall.SYS_IOCTL, vmFd,
KVM_CREATE_VCPU, 0)
// 4. Map memory for the VM (Guest RAM)
// ... mmap() logic here ...
fmt.Println("Starting VCPU Loop...")
for {
// 5. Run the VCPU (Enters VMX Non-Root Mode)
// This blocks until a VM Exit occurs
syscall.Syscall(syscall.SYS_IOCTL, vcpuFd, KVM_RUN, 0)
// 6. Handle VM Exit (Check exit reason in shared memory)
// reason := kvmRunStruct.exit_reason
// if reason == KVM_EXIT_IO { handleIO() }
// if reason == KVM_EXIT_HLT { break }
fmt.Println("VM Exit handled, resuming...")
}
}
const (
KVM_CREATE_VM = 0xAE01
KVM_CREATE_VCPU = 0xAE41
KVM_RUN = 0xAE80
)
public class HypervisorSimulation {
// Simulating the CPU State
static class VCPU {
int[] registers = new int[4]; // EAX, EBX, etc.
boolean running = true;
void run() {
while (running) {
// Fetch decode execute...
Instruction instr = fetch();
if (instr.isPrivileged()) {
// TRAP! Return control to Hypervisor
throw new VmExitException("PRIVILEGED_INSTR");
} else {
execute(instr);
}
}
}
// Mock methods
Instruction fetch() { return new Instruction("MOV"); }
void execute(Instruction i) { /* ... */ }
}
public static void main(String[] args) {
VCPU guestCpu = new VCPU();
System.out.println("Hypervisor: Starting Guest...");
while (guestCpu.running) {
try {
// "VM Entry" - Switch to Guest Mode
guestCpu.run();
} catch (VmExitException e) {
// "VM Exit" - Handle the Trap
System.out.println("VM EXIT: " + e.getMessage());
if (e.getMessage().equals("PRIVILEGED_INSTR")) {
handlePrivileged(guestCpu);
}
}
}
}
static void handlePrivileged(VCPU cpu) {
// Emulate the instruction or terminate
System.out.println("Hypervisor: Emulating instruction...");
}
}
class VmExitException extends RuntimeException {
public VmExitException(String reason) { super(reason); }
}
class Instruction {
String op;
public Instruction(String op) { this.op = op; }
public boolean isPrivileged() { return false; } // Mock
}
[!TIP] Virtio: Instead of emulating a real network card (Intel e1000) which is slow (lots of VM Exits for every packet), we use Paravirtualization. The Guest OS knows it’s a VM and uses a special “Virtio” driver to talk directly to the Hypervisor using shared memory rings, bypassing expensive emulation.
7. Summary
- Type 1 vs Type 2: Bare metal vs Hosted.
- Hardware Assist:
VMX Root(Hypervisor) andVMX Non-Root(Guest) modes eliminate binary translation. - VM Exit: The mechanism for the CPU to trap to the Hypervisor when the Guest tries to touch hardware.
- EPT: Hardware-accelerated memory translation (GVA → GPA → HPA).