Introduction to Strings
[!NOTE] This module explores the core principles of Introduction to Strings, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Hook: Strings are not just Arrays
While a string appears to be a simple sequence of characters, they are one of the most complex “basic” types in modern systems.
Imagine you are building a search engine:
- Simple Search: Works for English text (
ASCII). - Global Search: Crashes when users input Hindi, Arabic, or Emojis (🐂).
Data Structures like Strings determine how we represent human language in a digital world of 0s and 1s.
2. The 4 Pillars of String Mastery
| Pillar | Focus | Why it matters |
|---|---|---|
| 1. Intuition | Immutability | Understanding why s + "a" is expensive. |
| 2. Rigor | Rolling Hashes | Mathematical guarantees for fast pattern matching. |
| 3. Hardware | Encoding (UTF-8) | Knowing the difference between a “Byte” and a “Character”. |
| 4. Patterns | Tries & Windows | Efficient prefix and substring management. |
3. What is a String?
A String is a sequence of characters. In modern languages like Java and Go, strings are treated as first-class citizens but have specific behaviors you must understand to write efficient code.
The Immutability Rule
In both Java and Go, strings are immutable. This means once a string object is created, it cannot be changed.
- Java:
String s = "hello"creates an object. If you dos = s + " world", you aren’t modifying the original “hello”. You are creating a brand new string “hello world”. - Go:
s := "hello"creates a string header.s += " world"allocates a new string.
[!IMPORTANT] Why Immutability?
- Security: Strings are used for parameters like database URLs, passwords, and file paths. Immutability ensures these values can’t be changed maliciously or accidentally.
- Thread Safety: Immutable objects are inherently thread-safe.
- String Pool: Java maintains a “String Pool” to save memory. If strings were mutable, changing one reference would affect all others pointing to the same literal.
4. Memory Layout
Understanding how strings are stored helps explain their performance characteristics.
Java (java.lang.String)
In modern Java (Compact Strings), a String object contains:
- Object Header: Metadata for the JVM.
- Coder: A byte flag indicating if the string is Latin-1 (1 byte/char) or UTF-16 (2 bytes/char).
- Value: A byte array
byte[]holding the data. - Hash: A cached hash code (computed lazily).
Go (string)
A Go string is a lightweight struct (header) containing:
- Data: A pointer to the underlying byte array.
- Len: The length of the string in bytes.
Because Go strings are just a pointer and a length, passing them around is very cheap (constant time), regardless of the string’s size.
5. Interactive Visualizer: Immutability vs. Builder
When you concatenate immutable strings in a loop, you create “garbage” intermediate objects. Using a Builder (Java StringBuilder, Go strings.Builder) uses a mutable buffer that expands only when needed.
6. Basic Operations & Complexity
| Operation | Java (java.lang.String) |
Go (string) |
Time Complexity | Note |
|---|---|---|---|---|
| Length | s.length() |
len(s) |
O(1) |
Stored in header |
| Access | s.charAt(i) |
s[i] |
O(1) |
Direct array access |
| Concatenation | s1 + s2 |
s1 + s2 |
O(N+M) |
Creates new string |
| Substring | s.substring(i, j) |
s[i:j] |
O(N) vs O(1) |
Go slices are O(1) |
[!WARNING] Go Substring Gotcha: In Go,
s[i:j]creates a slice sharing the same underlying array. This is fastO(1), but if you slice a tiny part of a huge string and keep it in memory, the entire huge array stays in memory (memory leak). Usestrings.Clone(s[i:j])orstring([]byte(s[i:j]))to force a copy if needed.
7. ASCII vs Unicode vs UTF-8
- ASCII: 7-bit (0-127). English only.
- Unicode: The standard assigning a unique number (Code Point) to every character in every language.
- UTF-8: The encoding. Variable length (1-4 bytes). ASCII compatible (1 byte).
- Emoji 🐂 might take 4 bytes.
- Letter ‘A’ takes 1 byte.
[!WARNING] Trap:
s.length()might not mean “number of characters” if you have Emojis. In Javacharis UTF-16 (2 bytes), so some Emojis take 2 chars (Surrogate Pair).
8. Code Implementation
Here is how to perform common string operations efficiently.
public class StringOps {
public static void main(String[] args) {
// 1. Creation
String s1 = "Hello";
// 2. Length (O(1))
int len = s1.length();
// 3. Concatenation (Inefficient for loops)
String s2 = s1 + " World";
// 4. Builder (Efficient O(N))
StringBuilder sb = new StringBuilder();
sb.append("Hello");
sb.append(" ");
sb.append("World");
String result = sb.toString();
// 5. Access
char c = s1.charAt(1); // 'e'
// 6. Substring (O(N) copy)
String sub = s1.substring(1, 4); // "ell"
// 7. Comparison
// NEVER use == for content comparison!
boolean isEqual = s1.equals("Hello"); // true
}
}
package main
import (
"fmt"
"strings"
)
func main() {
// 1. Creation
s1 := "Hello"
// 2. Length (O(1) bytes)
// Note: len() gives bytes, not chars (runes)
l := len(s1)
fmt.Println("Bytes:", l)
// 3. Concatenation (Inefficient for loops)
s2 := s1 + " World"
fmt.Println(s2)
// 4. Builder (Efficient O(N))
var sb strings.Builder
sb.WriteString("Hello")
sb.WriteString(" ")
sb.WriteString("World")
result := sb.String()
fmt.Println(result)
// 5. Access
b := s1[1] // Byte at index 1 ('e' is 101)
// 6. Substring (O(1) shared memory)
sub := s1[1:4] // "ell"
// 7. Comparison
isEqual := s1 == "Hello" // true (Go allows ==)
fmt.Println(isEqual)
}
Next Steps
Now that you know how strings are stored, let’s learn how to find patterns within them efficiently.
9. Deep Dive Strategy Lab: Basic String Operations
Intuition Through Analogy
Think of this chapter like searching text in a large log stream. The goal is not to memorize a fixed trick, but to repeatedly answer:
- What is the bottleneck?
- Which constraint dominates (time, memory, latency, correctness)?
- Which representation makes the bottleneck easier to eliminate?