Tensors (Rank-3+) & Operations
[!NOTE] This module explores the core principles of Tensors (Rank-3+) & Operations, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. Introduction: Beyond Matrices
So far, we’ve dealt with:
- Scalars: Single numbers (Rank 0).
- Vectors: Lists of numbers (Rank 1).
- Matrices: Grids of numbers (Rank 2).
But real-world data is often more complex.
- Color Images: A standard image is Height × Width × 3 Color Channels (Red, Green, Blue). This is a Rank-3 Tensor.
- Video Batches: A batch of videos is (Batch Size × Time × Height × Width × Channels). This is a Rank-5 Tensor.
In Deep Learning frameworks like PyTorch and TensorFlow, everything is a Tensor.
Case Study: High-Dimensional Time-Series for Autonomous Vehicles (PEDALS)
To understand why we need tensors, let’s consider the problem of building an object detection system for a self-driving car.
- P (Process Requirements): The car has 8 cameras around its body. Each camera streams 4K video at 60 Frames Per Second (FPS). We need to process these frames in batches to predict object bounding boxes.
- E (Estimate): A single RGB frame is
(Height, Width, Channels). For a 4K image, this is(2160, 3840, 3). One second of video from 8 cameras is60 * 8 = 480frames. - D (Data Model): We need a data structure to efficiently hold and compute over this batch. A scalar, vector, or matrix cannot represent the temporal (time) and spatial (cameras) relationships. We use a Rank-5 Tensor.
- A (Architecture): The model processes a batch shape of
(Batch_Size, Time_Steps, Cameras, Height, Width, Channels). If we batch 1 second of data, our tensor shape is(1, 60, 8, 2160, 3840, 3). - L (Localized Details): Why group them like this? Because convolutional layers in neural networks need spatial adjacency (pixels next to each other) and temporal adjacency (frames next to each other in time) to be stored closely in GPU memory. A tensor ensures this contiguous memory allocation.
- S (Scale): As we increase batch size or camera resolution, the tensor simply grows along those specific dimensions, allowing the GPU to use highly optimized multi-dimensional matrix multiplications.
2. Tensor Ranks & Shapes
The Rank (or Order) of a tensor is the number of indices required to access a specific element.
| Rank | Name | Example Shape | Analogy |
|---|---|---|---|
| 0 | Scalar | () |
A single point. |
| 1 | Vector | (3) |
A line of points. |
| 2 | Matrix | (3, 3) |
A sheet of paper (grid). |
| 3 | Tensor | (3, 3, 3) |
A book (stack of sheets). |
| 4 | Tensor | (10, 3, 3, 3) |
A shelf of books. |
| 5 | Tensor | (5, 10, 3, 3, 3) |
A library of shelves. |
3. Interactive Visualizer: The Tensor Operator
Explore two key concepts: Slicing (cutting a tensor) and Broadcasting (stretching a tensor).
[!TIP] Try it yourself: Switch between “Slicing” and “Broadcasting” modes. In Slicing mode, move the slider to see different layers of the 3D tensor. In Broadcasting mode, click “Trigger Broadcast” to see how a vector expands to match a matrix.
4. Key Operations
A. Broadcasting: The Elastic Ruler
What if you add a Vector (4) to a Matrix (4, 4)?
The vector acts like an Elastic Ruler. It “stretches” (copies) itself across the missing dimension to match the matrix shape.
The Rules of Broadcasting:
- Align Shapes from Right: Start with the last dimension.
- Match or Stretch:
- If dimensions are equal, great.
- If one dimension is 1 (or missing), it is stretched to match the other.
- If dimensions mismatch (e.g., 3 vs 4) and neither is 1, Error.
Matrix A: (4, 4)
Vector B: ( , 4) <-- Stretches vertically to (4, 4)
Result: (4, 4)
B. Coding in PyTorch
Here is how we manipulate tensors in PyTorch, the standard for Deep Learning.
import torch
# 1. Creating Tensors
scalar = torch.tensor(3.14)
vector = torch.tensor([1, 2, 3])
matrix = torch.rand(2, 2) # Random 2x2
tensor = torch.zeros(2, 3, 4) # 2x3x4 tensor of zeros
print(f"Tensor Shape: {tensor.shape}")
# Output: torch.Size([2, 3, 4])
# 2. Reshaping and Permuting
# Reshape: Change dimensions without changing data order
flat = tensor.view(24) # Flatten to 1D
print(f"Flattened: {flat.shape}")
# Permute: Swap dimensions (e.g., Image (C, H, W) -> (H, W, C))
img = torch.rand(3, 256, 256) # C, H, W
img_permuted = img.permute(1, 2, 0) # H, W, C
print(f"Permuted: {img_permuted.shape}")
# 3. Broadcasting
a = torch.ones(3, 3)
b = torch.tensor([1, 2, 3]) # Shape (3)
# b is broadcasted to [[1,2,3], [1,2,3], [1,2,3]]
c = a + b
print("\nBroadcast Result:\n", c)
5. Summary
- Tensor: A multidimensional array of numbers.
- Rank: The number of axes (dimensions).
- Broadcasting: Implicitly copying data to match shapes, saving memory and code lines.
- Permute: Swapping axes (essential for image processing).