What is video encoding and how encode videos

Video encoding is the process of creating a video file using a given codec. The video codec chosen is like a template that describes how the resulting data should be stored, compressed, and eventually decoded for playback.

For example; H.264 is a common codec for encoding video. To play it back after a video file has been created with it, you need a H.264 decoder. Decoding could be done in software or with dedicated hardware to make it faster and more efficient.

What’s the difference between encoding and transcoding?

You may see the term “transcoding” used interchangeably with “encoding”. Transcoding is the process of decoding (decompressing) from one codec and then encoding (compressing) it into another. Encoding on its own only strictly happens when the video is first captured, turning captured light into digital data.

Even though most people call it encoding, they’re usually referring to transcoding because one video format is being turned into a different one.

It's also common to hear someone say they are "converting" the video into a different format instead of using the word "transcoding".

What's the difference between a codec and container?

Codecs describe how to compress and store a video stream. Container formats control how video and audio tracks are grouped together into the same file, sometimes including text tracks as well for captions. Audio and video tracks will use different codecs for encoding so require something else to package them together.

For example, H.264 is a codec that is often delivered in an MP4 (MPEG-4) container. Your video file will likely have the .mp4 extension for this. Confusingly, there are both codecs and containers that have "MPEG" in the name.

Common web codecs include H.264, H.265 (aka HEVC), AV1, and VP8/9.

The most common container formats on the web are MP4, WebM, and MOV.

There are many other specialist codecs and containers used for traditional video recording and editing as they require very high quality source files to work

Why encode a video for video streaming?

Encoding using an efficient codec makes viewing video over the internet possible. Uncompressed video, video that is stored "raw", produces massive file sizes. It's so large that you can't fit very much of it on physical media like DVD's or Blu-Ray discs and usually require very large hard drives to store.

For example, uncompressed 1080p video can have a bitrate of 2.98 Gbit/s (Gigabits per second). The average home internet connection, very rarely reaching gigabit speeds, would take a very long time to receive an uncompressed video.

Internet video streaming is more complicated than just encoding one input file and one output file though. To deliver a great user experience across a variety of internet connections, you actually need to encode the input file to many different output files at different resolutions and compression rates.

This approach is referred to adaptive bitrate streaming, or ABR. This is what allows you to switch between different resolutions when watching video back on the internet, because all of the these different resolutions have already been encoded ready for you to switch between during playback.

You should carefully consider what encoding settings to use for each rendition (output file) of the original video so that playback is smooth for a variety of internet connection speeds.

If you've made it this far, you might be interested in the Mux Video API to encode your video files.

Learn more about Mux Video

How does video encoding work? Mux explains

Uncompressed video is very simple, just store the color of every pixel of every frame and have a perfect image for every frame of the video ready to play. The big drawback of this is the huge file sizes this produces.

Encoding works by analysing the input file for patterns and then storing the patterns in the output file in a way that results in smaller file sizes. It takes up much less space to say "the next 1000 pixels are blue" than to write "blue..." 1000 times in row for each pixel.

Different codecs introduce more and more complex pattern recognition to more efficiently compress the image. When the compressed video is able to 100% accurately decode into the original input file we say that the compressed video is "lossless". If we lose data during compression, because we've thrown away too much data in order to compress it better, we call it "lossy" because there's no way of getting the original data back from the compressed version.

Codecs don't just look at nearby pixels though, they look at how patterns change over multiple frames of video and encode the differences that happen over time.

This results in different types of frames being stored in the final video: "I-frames" store all the information needed to produce a whole image, and "P-frames" and "B-frames" only store enough information to change the previous frame into the next frame. The trade-off is that much more information needs to be discarded to produce these in-between frames and so quality can sometimes suffer if there is too many of them.

How long a video plays between I-Frames is often referred to as keyframe distance or GOP (group of pictures) size. A keyframe distance of 2 seconds and a frame rate of 30 frames per second means that there is a solid I-Frame every 60 frames. If this distance is too large, compression artefacts often appear because it has been playing for too long without reaching a solid frame.

How long does video encoding take?

For a vast majority of use cases, encoding is the most time consuming part of processing video. How long it takes depends on a lot variables but generally, a short video could take a few seconds while a very long video with a large file size using an advanced codec could take many hours.

Most video encoding services encode the whole video before it can be used for playback. This is a big bottleneck to ensuring fast playback after you upload a video file to one of them as you have to wait until the whole video is finished encoding before you can play any of it back. Mux specifically can play back a video of any length almost immediately because of a unique feature called just-in-time video encoding.

Instant video encoding for web streaming

Just-in-time encoding is an encoding process that let's viewers start watching a video even though the whole file hasn't been encoded yet. When a viewer requests a video for playback, the transcoding process starts and bytes of the video are delivered to the viewer as soon as the first frame of the video is encoded. Every Mux Video customer gets just-in-time encoding. Create an account to test it out.

The best encoding settings for file-size and quality

Encoding requires you to decide what tradeoffs you’re willing to make with regards to quality and file-size.

For example, you may choose to encode a higher quality video that you can deliver to users with a fast internet connection. Or, you might limit the quality of the video so you can save money on storing it or to make it faster to deliver to users on slower connection speeds.

Quality in video encoding is normally controlled by:

Bitrate: how many "bits" of information is stored per second of video. It's normally expressed in Kbps (Kilobits pers second), Mbps (Megabits) or Gbps (Gigabits per second). Not to be confused with "Megabytes" and "Gigabytes" which normally describe the resulting total file size.
"Constant" or "Fixed" bitrate: whether the bitrate will fluctuate across the duration of the video
Bit depth: How many bits are used to describe each pixel. Usually 8-bit, 10-bit is used for higher quality encodes like UHD (ultra-HD) 4K blu-rays.
Dynamic range (HDR): Specialist codecs like HEVC are often needed to encode high dynamic range content and usually require a higher bit depth to accommodate storing the extra range.

If you're using a video platform like Mux, then these settings have all been carefully chosen for you and encoding will be optimised for streaming to give you the best balance between speed, quality and cost.

Configuring all of these settings can often take a lot of trial and error to achieve the results that you're looking for. There are a few tools available to help with encoding your videos:

FFmpeg: a command line utility that is able to encode/transcode just about any codec imaginable into any other one. A steep learning curve to learn but you will have complete control over every aspect of the encoding process
Handbrake: an open source tool for encoding video into many formats

Advancements in encoding quality:

Normally, video platforms apply the same encoding settings to every single video uploaded. The drawback of this is that one-size doesn't fit all when it comes to encoding videos. Depending on how complex the content is you might be using too many bits to store the video or throwing away too much information and reducing the quality by relying on generic settings.

Per-title video encoding is a process that adjusts the encoding settings automatically for each video based on analysing the unique patterns and visual complexity inherent in them.

Advanced techniques like machine-learning is often used to pre-process the video to figure out what the optimum settings would be to maximise quality and select the most appropriate encoding settings for each individual video. Per-title encoding is included for every Mux account so you can be confident that we're squeezing as much quality out of every byte possible. Sign up today to see the difference.

Why use a video encoding service instead of doing it all yourself?

If you have just a few videos, you can use encoding software to encode the videos on your desktop by hand. If you need to encode lot's of videos though this can become very tedious. You will also need to encode lot's of different versions of each video so that users can pick different resolutions for playback.

On top of this, you will ideally want:

Your videos delivered by a CDN for quick delivery
To produce thumbnails, poster images and gifs for previews
Automatic encoding to multiple resolutions with optimal settings
The option to allow users to upload videos directly instead of sending them to you first
Restricted playback, only let users watch videos that you grant access to
And lot's of other considerations...

If you’re building an app with user-generated content (UGC), like the next YouTube, you’re going to need a process that can handle scale scaling video upload and playback very quickly. It wouldn’t be feasible to manually encode each video, one at a time, on your desktop. A cloud video encoding service like Mux can encode thousands of videos quickly and reliably.

Video encoding FAQs

What's the difference between lossless and lossy encoding?

Lossless encoding compresses video without discarding any data—when decoded, you get a perfect reproduction of the original. Lossy encoding discards some data to achieve better compression, resulting in some quality loss. Almost all web video uses lossy encoding (H.264, H.265, VP9) because lossless files are too large for streaming. The goal is finding the right balance where quality loss is imperceptible to viewers while file sizes remain manageable.

Why do I-frames matter for video streaming?

I-frames (intra-coded frames) contain complete image information, while P-frames and B-frames only store changes from previous frames. Players can only start playback or seek to I-frame positions, making I-frame frequency important. Too few I-frames (large GOP size) means slower seeking and more compression artifacts. Too many I-frames increases file size unnecessarily. Typical GOP sizes are 2-4 seconds for streaming video.

Should I encode my own videos or use a video platform?

Encode yourself only if you have simple needs (few videos, single resolution). For production use—especially with user-generated content, multiple resolutions, or scale requirements—use a video platform. They handle adaptive bitrate encoding, CDN delivery, thumbnail generation, access control, and optimization automatically. The engineering cost of building this infrastructure yourself far exceeds platform costs for most use cases.

What bitrate should I use for different resolutions?

Rough guidelines for H.264: 4K (20-35 Mbps), 1080p (4-6 Mbps), 720p (2-4 Mbps), 480p (1-2 Mbps). However, optimal bitrate depends heavily on content complexity—slow-moving talking heads need less than fast sports action. Modern platforms use per-title encoding to analyze each video and select optimal bitrates automatically, rather than applying generic settings to all content.

What's adaptive bitrate streaming and why does it matter?

Adaptive bitrate streaming (ABR) provides multiple encoded versions of the same video at different resolutions and bitrates. Players automatically switch between versions based on viewer bandwidth and device capabilities, ensuring smooth playback without buffering. This is why platforms encode videos at multiple quality levels (360p, 480p, 720p, 1080p, etc.) rather than just one—it accommodates viewers with varying connection speeds.

How does just-in-time encoding work?

Traditional encoding requires processing the entire video before any playback can begin. Just-in-time encoding starts delivering encoded segments to viewers as soon as they're ready, even while encoding continues for the rest of the video. This dramatically reduces time-to-first-frame, especially for long videos. The viewer can start watching immediately while encoding completes in the background for later parts of the video.

What's the difference between H.264 and H.265 encoding?

H.265 (HEVC) achieves roughly 50% better compression than H.264 at equivalent quality, but requires significantly more encoding time and computational resources. H.265 also isn't universally supported on older devices. Use H.264 for maximum compatibility and faster encoding. Use H.265 when file size or bandwidth is critical and you know viewer devices support it. Many platforms now support both, selecting appropriately per viewer.

Can I improve quality by re-encoding already encoded video?

No. Re-encoding introduces generation loss—each encoding pass discards more information and adds compression artifacts. If source video is already heavily compressed, further encoding will degrade quality noticeably. Always encode from the highest quality source available (ideally the camera's original file). If you only have compressed video, avoid re-encoding when possible or accept that quality will suffer.

Back to Articles

Table of Contents

Everything you need to know about video encoding