How Video Compression Works

Have you ever wondered how a 4K movie, which would take up over 100 terabytes in its raw, uncompressed form, can be streamed smoothly over a standard home internet connection? The answer lies in the incredible science of video compression.

In this guide, we’ll explore the fundamental techniques that make modern digital video possible.

The Problem: Data Explosion

Raw, uncompressed video is massive. A single 4K frame (3840x2160) with 8-bit color depth requires about 24 megabytes. At 60 frames per second, that’s 1.4 gigabytes per second of video data.

Without compression:

A 2-hour movie would require 10.3 Terabytes of storage.
Streaming would require a connection speed of 11,000 Mbps.

Compression is not just a feature; it is the backbone of the modern internet.

The Strategy: Exploiting Redundancy

Video compression works by identifying and removing redundant information. There are two primary types of redundancy in video:

1. Spatial Redundancy (Intra-frame)

This refers to redundancy within a single frame. In many images, pixels near each other are very similar (e.g., a blue sky, a white wall). Instead of saving every pixel individually, we can describe their similarities.

2. Temporal Redundancy (Inter-frame)

This is the “secret sauce” of video compression. In most videos, one frame is very similar to the next. Only a small portion of the image changes between frames (usually due to motion). By only recording the changes between frames, we can save a massive amount of data.

The Compression Pipeline

Modern codecs like H.264 and H.265 follow a standard sequence of steps:

1. Prediction (Motion Compensation)

The encoder divides the video into different frame types:

I-Frames (Intra): Complete images, like a JPEG. They don’t depend on other frames.
P-Frames (Predicted): Only store the differences relative to the previous frame.
B-Frames (Bi-predictive): Store differences relative to both previous and future frames.

By using Motion Estimation, the encoder finds blocks of pixels that have moved from one frame to another and just stores a “motion vector” rather than the pixels themselves.

2. Transformation (DCT/DST)

After prediction, we are left with a “residual” (the error or the part that couldn’t be predicted). This residual is converted from the spatial domain (pixels) to the frequency domain using the Discrete Cosine Transform (DCT).

This step doesn’t lose data but organizes it so that the most important information (low frequencies) is separated from the less visible details (high frequencies).

3. Quantization

This is where the actual compression (and quality loss) occurs. The encoder reduces the precision of the high-frequency coefficients. Since the human eye is less sensitive to fine high-frequency details, we can discard much of this information without a noticeable drop in quality.

4. Entropy Coding

Finally, the remaining data is compressed using lossless methods like CABAC (Context-Adaptive Binary Arithmetic Coding). This is similar to how a ZIP file works, squeezing out any remaining statistical patterns.

Modern Codec Comparison

Different codecs use these steps with varying degrees of complexity.

Feature	H.264 (AVC)	H.265 (HEVC)	AV1
Release Year	2003	2013	2018
Efficiency	Standard	~50% better than H.264	~30% better than H.265
Complexity	Low	High	Very High
Best For	Universal compatibility	4K/HDR content	Future-proof streaming

Conclusion

Video compression is a delicate balancing act between three factors: Bitrate, Quality, and Computational Power.

As we move toward 8K video and VR, new codecs like AV1 and VVC are pushing these boundaries even further, using even more sophisticated mathematical models to squeeze every bit of efficiency out of our data.

Understanding these basics helps you make better decisions when choosing formats for your own video projects or when analyzing streams using tools like H.264.online.