How does malware actually get into a system? It usually exploits a vulnerability. The most famous vulnerability in computer science history is the Buffer Overflow.

Imagine pouring a gallon of water into a shot glass. The glass fills up instantly, and the excess water spills onto the table, ruining whatever documents happen to be sitting there. In software, when a program accepts more data than it has allocated space for, the excess data “spills” into adjacent memory.

1. Anatomy of a Buffer Overflow (Stack Smashing)

To understand buffer overflows, we must first understand how an Operating System structures a program’s memory. When a program runs, its memory is divided into segments:

  • Text Segment: Contains the compiled machine code instructions.
  • Data Segment: Global and static variables.
  • Heap: Dynamically allocated memory (e.g., malloc in C).
  • Stack: Stores local variables, function arguments, and the crucial Return Address for active function calls.

When a function is called, a new “Stack Frame” is pushed onto the stack. This frame contains the local variables (buffers) and the Return Address (where the CPU should resume execution after the function finishes).

If a program blindly copies user input into a small local buffer without checking the input’s size, a Stack Smashing attack occurs:

  1. The Overflow: The user input fills the allocated buffer.
  2. The Spill: Because the input is too large, the excess data continues writing into the neighboring higher memory addresses on the stack.
  3. The Hijack: The overflowing data intentionally overwrites the Return Address stored in the stack frame.
  4. The Execution: When the function completes and attempts to return, the CPU reads the modified Return Address and blindly jumps to the hacker’s injected memory address. This address usually points to malicious shellcode (a small payload designed to open a command shell) that was included in the overflowing input.

2. Interactive: Stack Smashing Visualizer

See what happens when you pour 12 bytes of data into an 8-byte buffer.

Stack Memory (High Addr)
Ret Addr: 0x4005
Buffer (8 bytes)
[EMPTY]
(Low Addr)
User Input
Waiting for input...

3. Malware Types

While buffer overflows are an entry vector, the payload itself is categorized by how it spreads and behaves.

  1. Virus: A parasitic piece of code that attaches itself to a legitimate executable or document host file. It requires human interaction (e.g., clicking an email attachment or running an infected program) to execute and propagate.
  2. Worm: A self-replicating program that travels across computer networks autonomously. Unlike a virus, a worm needs no human action. It actively scans networks for vulnerabilities (like an unpatched Server Message Block service) and copies itself directly to new hosts.
    • War Story: The Morris Worm (1988) – One of the first worms distributed via the Internet, exploiting vulnerabilities in UNIX sendmail and fingerd. It accidentally replicated too aggressively, creating the first large-scale denial-of-service attack.
  3. Trojan Horse: Malware disguised as legitimate, desirable software. Once installed by the deceived user, it performs malicious actions in the background, such as establishing a backdoor for remote access.
  4. Ransomware: Encrypts the victim’s files using strong cryptography and demands a ransom (usually cryptocurrency like Bitcoin) for the decryption key.
    • War Story: WannaCry (2017) – A devastating ransomware worm that spread globally using the EternalBlue exploit (a vulnerability in Microsoft’s SMB protocol).
  5. Rootkit: Deeply embedded malware designed to conceal its presence. Rootkits often modify the OS kernel itself (hooking system calls) to hide their malicious processes, files, and network connections from standard diagnostic tools like Task Manager or ps.
  6. Spyware & Keyloggers: Software that secretly monitors user behavior, capturing keystrokes, passwords, and sensitive data to send back to an attacker.
  7. Botnet: A network of infected, “zombie” computers controlled by a central Command and Control (C2) server. Botnets are often used to launch massive Distributed Denial of Service (DDoS) attacks or send spam.

4. Code Example: Unsafe vs Safe Memory

Why languages like C are dangerous and Go/Java are safe.

C
Go
#include <stdio.h>
#include <string.h>

void vulnerable_function(char *input) {
  char buffer[8];
  // DANGER: strcpy does not check size!
  // If input is > 8 bytes, it smashes the stack.
  strcpy(buffer, input);
}

int main() {
  vulnerable_function("ThisStringIsTooLong");
  return 0;
}
package main

import "fmt"

func main() {
  // Go slices track their length.
  // Arrays have fixed size.
  var buffer [8]byte
  input := []byte("ThisStringIsTooLong")

  // If we try to copy more than 8 bytes,
  // Go's built-in copy function only copies what fits.
  copied := copy(buffer[:], input)

  fmt.Printf("Copied %d bytes: %s\n", copied, buffer)
  // Output: Copied 8 bytes: ThisStri

  // No crash. No memory corruption.
}

5. Defenses (Modern Mitigations)

Operating systems and hardware vendors have developed robust mechanisms to prevent buffer overflows from being easily exploited.

  • NX Bit (No-Execute / W^X): A hardware-level feature enforced by the CPU’s Memory Management Unit (MMU). The OS marks memory pages containing the Stack and Heap as “Data Only” (Writable, but not Executable). If the CPU Instruction Pointer jumps to an address on the stack, it triggers a hardware fault and aborts the program, rendering injected shellcode useless.
  • ASLR (Address Space Layout Randomization): Historically, programs always loaded into the exact same memory addresses. Hackers could reliably hardcode the jump target for their exploits. ASLR randomizes the memory locations of the stack, heap, and libraries every time the program runs. Even if a hacker successfully hijacks the return address, they cannot predict where their shellcode or essential system functions (like libc) are located.
  • Stack Canaries: A compiler-level defense. The compiler automatically inserts a random secret value (the “canary”) onto the stack just before the Return Address. Before the function returns, it checks if the canary is still intact. If a buffer overflow occurs, the attacker must overwrite the canary to reach the Return Address. The program detects the modified (dead) canary and immediately aborts execution, preventing the hijack.