Building a Calculator: Expression Evaluation

Imagine you are tasked with building the core formula engine for a spreadsheet application like Excel, or writing the parser for a new programming language. A user inputs an expression: 3 + 4 * 2 / ( 1 - 5 ) ^ 2.

How do you write code that evaluates this to the correct mathematical result?

Computers do not intuitively understand “Order of Operations” (PEMDAS). They read left-to-right. If a computer naively evaluated 3 + 4 * 2 sequentially, it would get (3 + 4) * 2 = 14, instead of the correct answer, 11. To evaluate complex math correctly, we need a structured way to parse and process expressions, respecting operator precedence and parentheses.

Anatomy of an Expression Evaluator

When a computer processes a mathematical string, it moves through three distinct phases. Think of this as the “assembly line” of calculation:

1. Tokenizer String → Array 2. Shunting-Yard Infix → Postfix 3. RPN Engine Postfix → Value
  1. Lexical Analysis (Tokenizer): The raw string "34 + 2" is split into meaningful chunks (tokens): ["34", "+", "2"]. It distinguishes multi-digit numbers from operators.
  2. Parsing (Shunting-Yard): Converts the machine-hostile infix tokens into a strict, sequential postfix format: ["34", "2", "+"].
  3. Evaluation (RPN Engine): A simple stack-based machine processes the postfix tokens to compute the final integer result: 36.

We will focus on the Parsing and Evaluation phases in this chapter.

1. Notation Types: Speaking the Machine’s Language

Before we can calculate, we must translate human-readable math into a format a machine can process linearly.

  1. Infix Notation: 3 + 4
    • Human readable: The operator is in between the operands.
    • Machine hostile: Requires understanding precedence (multiply before add) and grouping (parentheses), making it difficult to process in a single left-to-right pass.
  2. Postfix Notation (Reverse Polish Notation or RPN): 3 4 +
    • Machine friendly: The operator is placed after the operands.
    • No parentheses needed: The order of the expression inherently dictates the order of operations.
    • Example: 3 + 4 * 2 becomes 3 4 2 * + in Postfix.

2. The Shunting-Yard Algorithm

Created by computing pioneer Edsger Dijkstra, the Shunting-Yard algorithm converts Infix → Postfix notation.

The Analogy: A Train Switching Yard

Think of the algorithm like a freight train switching yard.

  • The Main Track is our Output buffer (the final Postfix string).
  • The Side Track is our Operator Stack (a LIFO data structure).
  • Passenger Cars (Numbers): They have priority to get to their destination. They bypass the side track entirely and go straight to the Main Track.
  • Locomotives (Operators): They must be sequenced based on their horsepower (Precedence). When a locomotive arrives, it is routed to the Side Track. If a heavier, higher-priority locomotive is already on the side track, the weaker one must wait, forcing the higher-priority one out to the Main Track first to keep traffic flowing logically.

The Rules

We read the infix expression token by token from left to right:

  1. Number: Send it directly to the Output.
  2. Operator (+, -, *, /): Push it to the Stack. Crucial Rule: Before pushing, pop any operators from the Stack that have greater than or equal precedence, and append them to the Output.
  3. Left Parenthesis (: Push it to the Stack. It acts as a strict boundary.
  4. Right Parenthesis ): Pop operators from the Stack and append to Output until a matching Left Parenthesis ( is encountered. Pop and discard the (.

Interactive: Shunting-Yard Execution

Let’s convert the expression 3 * ( 4 + 2 ) to Postfix (3 4 2 + *). This example demonstrates the powerful parenthesis boundary rule.

Expression

Operator Stack

Output (Postfix)

Ready.

3. Evaluating Postfix (RPN)

Once we have successfully converted our expression to Postfix (3 4 2 * +), evaluation becomes trivial using a single Stack.

The Evaluation Rules

We read the postfix expression strictly left-to-right:

  1. If it’s a Number: Push it onto the stack.
  2. If it’s an Operator:
    • Pop the top two numbers from the stack.
    • Important: The first number popped is the right operand, and the second number popped is the left operand. This ordering matters immensely for subtraction and division!
    • Apply the operator to these two numbers.
    • Push the result back onto the stack.
  3. Result: When the expression ends, exactly one number will remain on the stack. That is your final answer.

Step-by-Step Example

Let’s evaluate 3 4 2 * +:

Token Scanned Action Taken Stack State (Bottom → Top)
3 Push 3 [3]
4 Push 4 [3, 4]
2 Push 2 [3, 4, 2]
* Pop 2 (right), Pop 4 (left). Calculate 4 * 2 = 8. Push 8. [3, 8]
+ Pop 8 (right), Pop 3 (left). Calculate 3 + 8 = 11. Push 11. [11]

Final Answer: 11.


⚔️ Industry War Story: The Excel Calculation Engine

When Microsoft was building the core calculation engine for Excel, they faced a massive challenge: parsing and evaluating complex formulas containing hundreds of nested parentheses and operators across thousands of interconnected cells. They couldn’t rely on simple recursive evaluation because deeply nested formulas would cause stack overflow errors and block the main thread.

Instead, they utilized heavily optimized variations of the Shunting-Yard algorithm to parse formulas into Abstract Syntax Trees (ASTs) and postfix representations once during the “compile” phase. When a user updates a single cell’s value, Excel doesn’t re-parse the massive formula text; it simply re-runs the lightning-fast, linear Postfix Evaluation (RPN) over the pre-compiled tokens using the new values. This decoupling of Parsing and Evaluation is what allows millions of cells to update in milliseconds.


4. Code Implementations

Let’s look at how to implement the Postfix Evaluation algorithm. Notice how elegantly it handles the operations.

Java

public int evalRPN(String[] tokens) {
    Deque<Integer> stack = new ArrayDeque<>();

    for (String token : tokens) {
        // If the token is an operator, pop two operands and calculate
        if (token.equals("+")) {
            stack.push(stack.pop() + stack.pop());
        } else if (token.equals("*")) {
            stack.push(stack.pop() * stack.pop());
        } else if (token.equals("-")) {
            int right = stack.pop(); // First pop is the right operand
            int left = stack.pop();  // Second pop is the left operand
            stack.push(left - right);
        } else if (token.equals("/")) {
            int right = stack.pop();
            int left = stack.pop();
            stack.push(left / right);
        } else {
            // If it's a number, push it to the stack
            stack.push(Integer.parseInt(token));
        }
    }

    // The final result is the only remaining element
    return stack.pop();
}

Go

func evalRPN(tokens []string) int {
    stack := []int{}

    for _, token := range tokens {
        if token == "+" || token == "-" || token == "*" || token == "/" {
            // Pop the top two elements
            right := stack[len(stack)-1]
            left := stack[len(stack)-2]
            stack = stack[:len(stack)-2] // Shrink stack

            // Perform calculation and push result
            var result int
            switch token {
            case "+": result = left + right
            case "-": result = left - right
            case "*": result = left * right
            case "/": result = left / right
            }
            stack = append(stack, result)
        } else {
            // If it's a number, convert to int and push
            val, _ := strconv.Atoi(token)
            stack = append(stack, val)
        }
    }

    return stack[0]
}

Complexity Analysis

Approach Time Complexity Space Complexity
Shunting-Yard (Infix to Postfix) O(N) - We process each token exactly once. O(N) - The operator stack stores at most N operators.
RPN Evaluation O(N) - We scan the postfix expression sequentially. O(N) - The stack stores at most N operands.