lurkertech.com Lurker's Guide Programmer's Guide to Video Systems

Programmer's Guide to Video Systems

By Chris Pirazzi. Many thanks to those who have provided valuable info, including Charles Poynton and Bob Williams. Thanks to Andy Walls for some insights on standard def square luma sampling frequencies.

Support
This Site
This hobby site is supported by readers like you. To guarantee future updates, please support the site in one of these ways:
donate now   Donate Now
Use your credit card or PayPal to donate in support of the site.

get anything at all from amazon.com
Use this link to Amazon—you pay the same, this site gets 4% from Amazon.
get a cool thai-english iPhone dictionary app
Learn Thai with my Talking Thai-English-Thai Dictionary for iPhone/iPad/iPod Touch.
get a cool thai-english Windows dictionary
Learn Thai with my Talking Thai-English-Thai Dictionary for Windows PCs.
get a cool thai-english paper dictionary
Don't leave home without the Thai-English English-Thai Compact Dictionary I co-authored.
get thailand fever
I co-authored this bilingual cultural guidebook to Thai-Western romantic relationships.

Submit
This Site
Like what you see?
Help spread the word using these social bookmarking sites:
StumbleUpon
del.icio.us
del.icio.us

Table of Contents

Introduction

When we software types hear that there are different "video systems" such as 480i, 576i, 1080i, or 720p, we immediately think of two things:

For example, if you look at the various wiki pages about video systems, you will see statements like "480i is the shorthand name for a video mode," and you'll see pictures like this:

as if video can be considered just a special setting of a VGA graphics adapter.

Well, it turns out that video is much more complex than just that, and it is absolutely crucial to understand certain additional videosyncrasies in order to write video capture, editing, processing, or playback software that does not cause bugs and compatibility problems for end-users.

This document will give you the basics that you need in order to avoid getting in trouble with video!

Disclaimer: there's one important videosyncrasy that is not yet covered here, and that's the different color systems (both Y'CbCr and R'G'B') that you will need to deal with in video software. It's important to get that right as well for your video software to work. Eventually, I will add material on this, but for now, check out Charles Poynton's excellent book on the subject. The situation is also summarized the QuickTime uncompressed standard that I wrote for Apple.

The Reality of Video Systems

We programmers like to think of video as a series of frames. Each frame, we imagine in our pleasant dreams, consists of one whole picture, say 640x480 pixels, all snapped at a single instant of time, and we like to believe there are, say, 30 of these pictures per second. To play back the video, we just display one picture after the other, at 30 images per second. We blissfully assume that the image size of the picture (640x480) is totally standardized and represents the "complete" picture. And, we assume that the pixels are square: that 100 vertical pixels is the same distance on the display monitor as 100 horizontal pixels.

Unfortunately, we're in for a rude awakening, because in many important cases that we must handle in video software today, some or all of these assumptions are wrong.

We programmers also like to think of video exclusively as data in our computer memory or hard disks. We often try to ignore the fact that video is also transmitted as an electrical (analog or digital) signal over wires, and stored on (gasp) videotape.

It turns out that if we take a moment to understand the bigger picture of video (no pun intended)—how it is transmitted electrically, how it is displayed by TVs and monitors, and how video geeks think of it—then suddenly it becomes tremendously easier to understand where these videosynracies come from, and to predict and handle them correctly in our video software.

So here we go...

What is a Video System?

A video system determines much more than just a resolution and frame rate. It includes: In the following sections, we'll introduce the many new concepts found in the list above.

What Are the Video Systems?

There are many video systems, but we will cover the most common ones:

SD/HDSystemRateUsed In
Standard Definition TV
(SDTV)
480i/60M60M fields/secondUS/Japan (example standard: NTSC)
576i/5050 fields/secondrest of world (example standard: PAL)
High Definition TV
(HDTV)
1080i/60M60M fields/secondUS/Japan
1080i/5050 fields/secondrest of world
720p/60M60M frames/secondUS/Japan
720p/5050 frames/secondrest of world

As you can see in the table, to unambiguously name one system, you should include the rate.

In this document, we will generally omit the rate from 480i and 576i because there is only one possibility.

In this document, we will drop the rate from 1080i and 720p in situations where our text applies to both the 50 and 60M systems. So 1080i and 720p are really groups of systems.

Other Video Systems

A system that is beginning use now is 1080p. SMPTE 274M-1995 specifies 1080p/24 and 1080p/24M, with 24 and 24M frames per second respectively, that are sometimes used in HD broadcast, and it also specifies 1080p/50 and 1080p/60M, which doubles the data rate of the currently broadcast 1080i/50 and 1080i/60M by providing a full 50 or 60M frames per second. At some point I will add that to this document.

The next most common system might be 480p, which used to be fairly common for high-end "progressive scan" DVD players in the US and Japan, but is increasingly being replaced by the HD formats. We will not cover 480p in this document. You can get some the vital statistics for one 480p electrical standard by reading SMPTE 267M-1995 or searching for 267M in the QuickTime uncompressed standard that I wrote for Apple.

There are a set of other "segmented field progressive" systems, such as 1080pSF/24, which are a kind of hack to carry a frame-based signal within what appears to be a field-based signal. These hacks arose in the early days of HD so that people could re-use their hugely expensive field-based equipment for frame-based imagery. I talk about these systems a little more in this other document, but I will not cover these systems in this document.

There is a truly bizarre video system called M-PAL that they use in Brazil. It is interlaced and has the infamous NTSC rate of 60M, but it is based on PAL color encoding and has 625 lines. It even has its own form of drop-frame timecode. We do not cover M-PAL in this document.

There was an old 1035i Japanese HDTV system (MUSE/Hi-Vision) with some very odd properties (optional 5:3 or 16:9 picture aspect ratio and even weirder pixel aspect ratio), but the last broadcast using that system ended on September 30, 2007 and Japan has totally migrated to 1080i and 720p. We don't cover 1035i in this document, but if one day you are required to handle legacy data from this system, you'll need to consult the relevant Japanese standards (in the QuickTime uncompressed standard that I wrote for Apple, I give the vitals on one 1035i format, the 16:9 SMPTE 240M-1995/SMPTE 260M-1992, but as this page shows, that wasn't the only format the Japanese used, so you'll have to see if your data follows the SMPTE standard).

Frame-Based and Field-Based Video

As software types, we like to think of video as a series of complete frames, say 640x480 pixels, all snapped at a single instant of time. This is the frame-based (aka progressive-scan) model.

Unfortunately, the majority of systems (480i, 576i, and 1080i) are field-based (aka interlaced or interleaved).

Roughly speaking, rather than representing video as a series of, say, 30 640x480 images per second, field-based systems represent video as a series of 60 640x240 images per second, and each image contains information for only half of the lines of the overall picture. In particular, within each pair of fields, one field has lines 0, 2, 4, ... and the other field has lines 1, 3, 5, .... But the catch is that all the fields are temporarlly distinct, so you simply do not have all the data you need to reconstruct a complete picture at any given moment of time!

This is such an important issue, with such deep implications for software, that we have a whole separate page dedicated to it:

All About Video Fields
You should pop over and read that page, either now or after you're done with this one.

Frame Rate or Field Rate and the Magic M (30/1.001, not 29.97)

Frame-based systems transmit a series of frames, and so they have a frame rate.

Field-based systems transmit a series of fields, and so they have a field rate, and a frame rate which is half of the field rate.

When you see the rate for the US/Japan standards listed in casual discussions and even some standards, you will often see it written as 60 or 59.94 for brevity. But it's not really 60 and it's not really 59.94—it's actually 60/1.001, for astounding historical reasons that go back to the 1950s standardization of color NTSC TV and still haunt us today. Similarly, if you see 30 or 29.97, it's really 30/1.001. And if you see 23.98, it's really 24/1.001.

While this distinction may not matter for casual reference to video standards, it is very important for software, because you often need to synchronize video with audio and 60 is quite different from 60/1.001. If you use the wrong number, your audio and video will quickly slide out of sync and your customer will report bugs.

For this reason, this document is precise. We consistently add the magic factor M, which is equal to 1/1.001, whenever it is called for. If you see 60 in this document, it's really 60. If you see 60M, then it's 60/1.001.

Nowadays, there's yet another reason not to be sloppy. The SMPTE specs which define 1080i and 720p actually define both 60 and 60M versions of the standard (and both 30 and 30M, and both 24 and 24M)! It's true that in most cases, if you see 60, it's really 60M, but it's quite possible that you may run into true 60, particularly as a compatibility bridge in 50 field/frame per second environments.

Fortunately, you don't have to worry about magic M for the European standards based on 25 or 50 frames/fields per second. They never had the sordid NTSC history that brought this all about.

You may also sometimes see the magic numbers 59.94 or 29.97. These numbers do not represent the rate of video (59.94 is not equal to 60/1.001, nor is it close enough for your software). Those numbers come from a separate hack relating to drop-frame SMPTE timecode and unless you are dealing with SMPTE timecode in your software, you should never use these numbers in your software. I cannot overstate how many buggy pieces of software, and how many undeniably mislabeled movie files, have been created over this one little misunderstanding.

The SMPTE and ITU industry standards themselves set a horrible example that will no doubt confuse the heck out of the industry. For example, SMPTE 274M-2005, which defines 1080i, uses the nomenclature "1920 x 1080/59.94/I" (Table 1) for 1080i/60M. Similar sloppiness exists in many of the SMPTE and ITU specs. The real 1/1.001 rate is in the spec of course, but I'm certain that the shorthand will cause problems. Ouch!

Picture Aspect Ratio

Each video system defines a picture aspect ratio, which is simply the horizontal-to-vertical aspect ratio of the TV or other display device on which the user watches the video.

Picture aspect ratio is not the same as pixel aspect ratio, which we will discuss later.

The standard-definition systems we cover in this document (480i and 576i) are 4:3.

The high-definition systems we cover in this document (1080i and 720p) are 16:9.

As you probably know, before HDTV arrived, people came up with several ways to do 16:9 standard-def:

If you need to be specific in your own writings, I would recommend using this terminology for the three different systems:

and similarly for 576i.

As we mentioned earlier, we will not be covering the strange old 1035i Japanese HDTV systems with a 5:3 picture aspect ratio.

Total Lines Per Frame: There's More Than You Thought

A video system doesn't just define dimensions for the picture (e.g. 640x480).

Instead, it introduces a broader concept of video lines, where some lines are picture lines and some are non-picture lines. For example, a video frame of an interlaced video signal looks like this:

These non-picture lines are used for many purposes:

You may have heard video geeks refer to 480i video as "525-line" and 576i video as "625-line." These names reflect the actual number of video lines in the video frame, including all those extra non-picture lines:

SystemTotal Number of Lines
480i/60M525
576i/50625
1080i (all variants)1125
720p (all variants)750

Video Line Numbering

The video system also defines a standard numbering scheme for those lines. For example, every 480i frame has lines numbered from 1 to 525, regardless of whether it's expressed as an analog (NTSC) video signal or a digital (SDI) video signal. The specification for each electrical standard will then provide the mapping between those numbers and the waveform or bits of that particular electrical standard.

Field Names: F1 and F2

If the video system is field-based, it also defines standard names for each field, F1 and F2, in terms of which video lines they occupy.

F1 and F2 are purely properties of the signal. They have nothing to do with software, or field dominance, or any other external factors. Specifically,

How Big is the Picture?

An important question for us software folks is obviously: which lines are picture lines and which are not? That will determine how many lines our video images in memory will have.

A good question, but we'll have to hold off on the answer until we've explained a few more vidiosyncrasies below.

How Fields Weave into a Frame

The video system also determines how the lines of each field weave into a frame, and it does that using the standard video line numbering scheme, and the standard F1 and F2 names, also defined by the video system.

For example, in the 480i system, here is how F1 and F2 line up:

Notice we've strategically avoided saying where the top and bottom of the picture are :) We'll be able to give you the answer to this later, after explaining some more vidiosyncrasies.

Analog Electrical Standards

Each video system encompasses one or more analog electrical standards that specify a way to transmit video using that system over an electrical wire. For example, some common standards are:

SystemAnalog Electrical Standards
480i/60MNTSC (component, composite, s-video)
576i/50PAL (component, composite, s-video)
SECAM (component, composite, s-video)
1080i (all variants)component analog HD
720p (all variants)component analog HD

Analog video has a couple of unique properties which, it turns out, have caused some pretty serious ramifications that bubble all the way up to the software level.

To understand the situation, let's take a quick look at an analog waveform. Analog video, as the name suggests, encodes the picture data along each line as a smooth, continuous electrical signal, like this:

In this figure (stolen from Hamlet Video International and heavily doctored), you can see a sample colorbar video image on the top. On the bottom, you see the electrical waveform for one line of this video signal (say, video line L, which we have marked). The vertical axis is voltage (actually IRE units which are proportional to voltage), and the horizontal axis is time (roughly 63.5 microseconds across).

Notice how the video signal consists of some funky pulses at the left (known as horizontal sync and colorburst), followed by the actual video signal for that line. You can see how the differently colored bars produce different analog signals.

It's not important exactly how the luma and chroma get encoded as voltage; the main point to take home is that the voltage varies smoothly and continuously. In other words:

So here's a situation where you could do it different ways. And, as a seasoned engineer, you know what that means: people did do it different ways, and this will cause problems for you.

We'll give you more details on this further on in the document.

Digital Electrical Standards

In addition to the analog electrical standards you're familiar with, the video industry also defines digital electrical standards for each video system.

These standards are not like DV/1394/Firewire or MPEG-2. Don't let the word "digital" fool you into thinking it's a computer thing.

These standards are high-bandwidth, clocked (not packet-based) dedicated interconnects that are used in TV production studio environments to transmit 8- or 10-bit uncompressed video between different devices in the studio. The connection is typically a coax cable with BNC jacks, and the standard is called serial digital (SDI). There is an HD flavor, creatively named HD serial digital (HD-SDI). Sometimes they use SDI or HD-SDI connections in pairs (e.g. to transmit R'G'B' data) and this is called "dual link."

A typical modern studio will have an ungodly expensive video switch that routes SDI or HD-SDI signals all around the facility, and every device (such as studio cameras, D1 or Digital Betacam VTRs, old-school dedicated effects boxes, and studio monitors) will have SDI or HD-SDI inputs and outputs.

The various industry specs that define the standard definition (480i, 576i) electrical standards all point back to ITU-R BT.601-4 (commonly known as "Rec. 601") for some basic parameters. For that reason, you will often hear these electrical standards referred to as "601." One of the first VTR tape formats to have VTRs supporting this standard was the uncompressed D1 tape format, so you will also hear this electrical standard referred to as "D1."

These digital electrical standards also share the property with the analog electrical standards that there is more stuff encoded on each line than just the picture. In fact, there's quite a lot of so-called horizontal ancillary space available to store data; it occurs in the same area where the analog signal is having its funky sync and colorburst pulses, and like the vertial ancillary space, it can contain timecode, audio, and other data.

While the digital electrical standards are rarely seen in the consumer world, they are the lifeblood of video production, and much of the design and influence on video devices, libraries, and software on computers comes from these standards (not to mention the design of the DVD format and satellite broadcast standards). For example, you've almost certainly seen "601" or "D1" pop up in the UIs of video software.

In particular, the geniuses who designed the standard definition (480i, 576i) digital electrical standards decided to make every line have 720 non-square pixels, which we'll talk about next, and these are the exact same 720 non-square pixels that you are often forced to deal with by various video devices on PCs.

Fortunately, sanity prevailed for HD (1080i and 720p), whose digital electrical signals all have square pixels.

Non-Square Pixels and Pixel Aspect Ratio

In our blissful graphics software world, we are used to assuming that a 100 pixel vertical line is the same length, as seen on the screen by the user, as a 100 pixel horizontal line. This is called square pixels.

Unfortunately, when video engineers standardized the 480i and 576i digital electrical standards, they decided to use non-square pixels, and as a result, you as a software person will often be called upon to read and write non-square pixel data.

If your application wants to draw a circular circle or process a video image with a symmetric blur, you need to consider the pixel aspect ratio of your video data.

We have a quick how-to guide page dedicated to dealing with this important videosyncrasy:

Square and Non-square Pixels
But in the next section we will take you into the bowels of the past so that you may really understand the non-square issues in all their evil glory.

What Pixels are Square? How Non-Square is Non-Square?

Think about old-school analog equipment—an analog camera, VTR, monitor, or video switch—as we explained above, there are no pixels!

All analog equipment cares about is that the video doesn't get squished or stretched in one direction, and stays centered. And the way video engineers deal with this is to put up a standard video test pattern from a trusted test pattern generator and tweak the knobs on their monitor, by hand, until the monitor image is properly positioned and scaled on the screen. Obviously, this is not a very precise process. Modern monitors are consistent enough that this adjustment is rarely, if ever, needed, because the criterion for the image proportions being "good enough" are very loose—they are the limits of human perception of a video monitor.

Now enter computers.

Being mostly sane, the people who designed analog video input hardware for computers wanted to be able to capture square pixels, since square pixels were much easier to understand, manipulate, and display on a computer screen.

So those people were faced with this seemingly simple question: how should their hardware sample the analog signal in order to get square pixels? They want each pixel along the line to represent the same distance, on a video monitor, as the distance from one video picture line to the next. So how far apart, in the incoming video waveform, should each pixel sample be? In other words, how many million pixels per second (MHz) should their hardware capture (the "luma sampling frequency")?

In order to answer that, the video engineers turned to the analog video specs. You would think that the specs would very clearly state how the vertical data should stay in proper proportion to the horizontal data. For example, you'd expect to find a ratio saying that H microseconds of horizontal "distance" scanning across the monitor must be the same length as exactly V video lines of vertical "distance" on the monitor.

Unfortunately, it turns out that there is no such number. You cannot even derive such a number from the other numbers in the spec. Even if you employ death-defying amounts of handwaving to interpret the spec, like some Bible scholar looking for the hidden lottery numbers revealed inside, you still cannot come up with an exact figure.

Because, historically, nobody needed a super-precise definition.

So, what did the fearless video engineers of the 90s do? They made something up.

For reasons that I have never been able to completely figure out, the industry adopted a convention of sampling analog 480i video at exactly 12 3/11 MHz, and 576i video at exactly 14.75 MHz (and they even used decimals instead of fractions :), in order to get square pixels. The only clue we have is that these frequencies give us an integral number of square sampling instants per line, something which makes hardware people very happy.

Lurker's Guide Trivia Contest! If you know a way to derive the 12 3/11 MHz and 14.75 MHz industry-standard luma sampling rates, or you know some historical clues as to where they came from, then please send me some mail! Be sure to read this page first to see what we've figured out so far. The prize is your name in lights on the top of this page. Well, ok, your name on the top of this page. And maybe some other pages too! So far thanks to Charles Poynton and Andy Walls for helpful pointers that get us part of the way.
That is the sordid truth, and hurt though it may, knowing that will make it a lot easier for you to understand what comes next:

Now think about old-school digital equipment: it all uses ITU-R BT.601-4 ("Rec. 601"), which, as we mentioned above, specifies that each line contains exactly 720 pixels that are non-square.

However, being pesky video engineers, the authors of ITU-R BT.601-4 clear forgot to mention exactly what the pixel aspect ratios of those pixels was, exactly. They did, however, specify a very precise luma sampling frequency of 13.5 MHz (not 13.5M MHz).

So, this gives us an incontrovertible way to answer the question of "how non-square are non-square pixels?" If 480i square pixels are sampled at 12 3/11 MHz, and 480i non-square pixels are sampled at 13.5 MHz, that must mean that each non-square pixel has aspect ratio:

12 3/11 MHz / 13.5 MHz = 10 / 11
A similar derivation can be used for the 576i system, to give us these values:

SystemhSpacingvSpacing
480i/60M1011
576i/505954

You can think of these values as the ratio of the width of a pixel to the height of a pixel. For example, say you want to draw a circle that appears round on the display device and whose diameter is n horizontal pixels (luma sampling instants). Draw an ellipse which is n pixels wide and n*hSpacing/vSpacing pixels (picture lines) high. Notice that vSpacing is in the denominator: the greater the vertical spacing of pixels (picture lines), the fewer vertical pixels you need in order to match a given number of horizontal pixels (luma sampling instants) on the display device.

Computer Industry Mass Confusion on Pixel Aspect Ratio

The sofware industry did not know or realize that the pixel aspect ratio of non-square pixels had already been determined by the hardware guys' choice of pixel sampling frequency.

Instead, what ensued was an unbelievable religious war of epic proportions, whereby uninformed computer people like us invented perhaps dozens of incorrect aspect ratios and argued about them endlessly like two Victorians bickering over whether the moon is made from cheese or crumpets. That unfortunately led to the creation of lots of mis-scaled and often un-labeled or mis-labeled video data, which will come back to haunt us in the future.

What mis-ratios did we use? There are the obvious mis-candidates, such as 640/720 and 786/720, but there are also amazingly nasty fractions created with Rube Goldberg-like proofs based on circularly dependent source data!

Even the MPEG committees got in on the action:

Hopefully from this data you have learned not to trust the pixel aspect ratio encoded in an MPEG-1 video bitstream.

And what about MPEG-2?

So I would watch out for the pixel aspect ratios encoded in all kinds of MPEG. If you know that data came from non-square-sampled video, use your own judgement.

SMPTE RP 187-1995 to the Rescue—Or Not

After the industry made something up and got back to work, a SMPTE standards committee was hard at work writing SMPTE RP 187-1995, a document that was finally supposed to address this pesky square and non-square issue.

Unfortunately, the pixel aspect ratio numbers they came up with for standard def (177/160 (y/x) for 480i and 1035/1132 (y/x) for 576i):

so they were quietly ignored.

Strangely, RP 187's informative annex A.4 talks about square to non-square conversion of computer images, and suggests resampling the image by a factor of 11:10—thus using the industry standard values and contradicting itself!

Fortunately, SMPTE RP 187-1995 made a major, positive contribution for HD (1080i and 720p), as we'll discuss later.

HDMI/CEA-861: Do We Ever Learn?

This disheartening but excellent submission comes in from reader Kevin Bracey:

I don't know if you're aware but CEA-861, and hence the HDMI spec which is derived from it, happily specify 720x576 and 720x480 digital signals (with 13.5MHz timing, and a statement that they're derived from ITU-R BT.656 and CEA-770.2), but then declares that those signals are exactly 4:3 or 16:9, and gives pixel aspect ratios corresponding to that, eg 16:15 for 4:3 576i (=1.067:1). Oops.

This is causing geometry errors in consumer kit. DVD players and the like are outputting their MPEG data with 1:1 pixel mapping to HDMI, and SD->HD upscalers and some HDMI-equipped displays treat the 720x576 HDMI frame as being 16:9. Pictures are 2.5% too thin.

You can switch between analogue and HDMI connections from the same source, and watch the picture get wider and narrower. And, often, you get little black bars at the left and right, from signals which don't have a full 720 pixels of image data - quite common.

It's a real struggle trying to talk to the producers of offending kit when such a vital spec is flouting reality. The HDMI spec itself doesn't go into aspect ratio detail, so it's the CEA-861 spec that is the problem.

The HDMI spec does sort of contradict the CEA spec in one place:

"For example, if a Source is processing material with fewer active pixels per line than required (i.e. [sic] 704 pixels vs. 720 pixels for standard definition MPEG2 material), it may add pixels to the left and right of the supplied material before transmitting across HDMI"
...a strong suggestion that it thinks that normal 704x576 material is narrower than HDMI, and it doesn't really think that a 702/704-wide 16:9 source should be scaled up to 720-wide because of the different aspect ratio.

In summary, we've got:

  1. DVD players, DVB receivers, etc, ignoring the pixel aspect ratio claimed for HDMI and outputting 1:1 pixel mapping. Good lads.

  2. Some TVs that default to treating 720x576 HDMI as 16:9 - can usually be corrected in the service menu, as they have separate geometry settings for SD and HD.

  3. Almost all SD->HD upscalers with HDMI output (either built in to an SD device, or separate) scale 720x576 to fill a 1920x1080 frame. If you're going through such an upscaler, you can't correct it, as the scalers don't usually have any manual scaling controls, and it can't be compensated for in the TV without screwing up proper 1080-line HD sources.

At least some standards people are getting it right, such as NorDig (section 5.2.2.3), but this underlying standard isn't.

Sigh.

The Standard-Def Pixel Debacle

Now you're equipped with the knowledge to understand another horrifying, sordid tale, which went down in the 1990s, that still plagues software writers today when working with standard-def video.

The story begins like this:

Old-School Equipment Doesn't Care Where the Picture Ends

Consider this:

Do you see the pattern in all this?

The deep, dark secret of old-school video equipment is that it actually doesn't care exactly where the edges of the picture are located. It just leaves enough margin so that nobody's information will get cut off, and is happy with that.

Instead, old-school video engineers are much more concerned with making sure that the picture center is maintained by all pieces of processing gear.

That is the reason why if you look at the analog and digital electrical specifications, you will see the picture center being specified in insane detail, but you will see little, if any, mention of where the picture starts and ends (either vertically or horizontally).

In the standard def 480i and 576i analog specs, each field actually begins and ends with a "half-line," where only half the line is allowed to contain picture data.

In the standard def 480i digital spec, not only are the half-lines gone, but we actually get an extra bonus picture line at the top of field 1, changing the "top" and "bottom" sense of F1 and F2.

So, as a software person, you must be getting frustrated and asking yourself "Ok, so, which lines do I use? Where do I start? Do I include the half-lines? Do I skip them? For analog video, where do I start and stop sampling along each line?" You need to write a for() loop somewhere and it needs to have a definite start and end.

The video engineer's answer, of course is "who the hell cares!" He doesn't share your perspective or your needs. Which explains why the original specs, to this day, have never been amended to clearly answer this critical question.

The Result: Chaos

The result was that, for many many years, different brands and models of video input hardware would give you images with:

Because many software engineers were not familiar with video, they would assume that if they have an image of a certain size, say, 640x480, that it must line up exactly with all other images of that size.

By the time all the dust settled (say, around 2000), we had some pretty clear de facto conventions, but in the meantime lots of legacy data was captured with different horizontal and vertical offsets.

And that is the reason why I have to spend so much energy explaining this mess to you, the software engineer. You shouldn't have to know about it, but you do, because your software will probably have to interpret that legacy data.

For example, if you write image compositing software, and the user imports and composites legacy data captured with two different hardware devices, you may have to offset those images (or force you user to do so) if they don't represent the same part of the underlying video signal, even if they have the same image size.

The De Facto Standard

In the section on 480i and 576i below, we will show the de facto square-pixel and non-square-pixel sampling pattern that emerged in the industry, including which lines get sampled and which part of each line gets sampled.

SMPTE RP 187-1995 to the Rescue Again—Sort Of

The SMPTE RP 187-1995 committee was also trying to address the pesky image size and offset issue.

The Good News: Apertures

Fortunately, SMPTE RP 187-1995 introduced two very useful terms into the industry:

The Bad News: Bad SD Choices

Unfortunately, the values that the committee chose for pixel aspect ratio, clean aperture and production aperture for standard definition systems (480i and 576i) did not match any industry practice, and so the result was ignored for those systems.

Instead, the industry adopted four de facto standards for production aperture:

Here we have just given you the production aperture size. We'll detail the offsets as well in our sections on 480i and 576i below.

Already there is something amiss here: given that the industry had also adopted de-facto pixel aspect ratios for non-square pixels in 480i video (10/11) and 576i (54/59), if you apply those ratios to the above production apertures, you will see that the apertures are different in the square and non-square cases:

Here is a visualization of that:

480i

720 non-square pixels

640 square pixels

576i

720 non-square pixels

768 square pixels

This breaks the intended model of SMPTE RP 187-1995: the production aperture should be constant for a given system; it should not depend on how the video is sampled.

Oh, well. That's why, when you convert between square and non-square images, you must not only scale them but also pad or crop data, as we explain in this how-to guide:

Square and Non-square Pixels

HD: They've Seen the Light!

You might be pretty depressed over all this standard-def bad news.

Fortunately, with HDTV, the video standard designers had the benefit of hindsight.

Most joyously, 1080i and 720p pixels are square!

Secondly, the standards documents (including SMPTE RP 187-1995) clearly answer the question of exactly which HD video lines, and which part of each video line, consitutes the picture data which must be stored and transmitted—the production aperture.

Even better, the production aperture has the familiar-sounding dimensions of 1920x1080 (1080i) and 1280x720 (720p), and the SMPTE spec clearly states which video lines of the underlying video signal, and which pixels within the line, that includes.

The SMPTE committee was even clever enough to choose clean apertures and production apertures that both had a 16:9 aspect ratio, even though that was only really required of the clean aperture.

We can now hope that video hardware will be designed so that it inputs and outputs this production aperture, and software will use this production aperture for its standard image size. In this ideal world, software people such as yourself won't even have to know that the "picture" is embedded into a larger raster. You'll just be able to think of HD as field or frame images.

Ok, ok, stop laughing.

Below I will give you all the line number details for 1080i and 720p, just as I did for standard def, so that just in case some hardware designer messes it up, you'll still be able to talk to him to figure out what video you're really getting in your memory buffers.

Computer Representations of Video

Phew! Enough about the video geek's world-view!

On computers (which includes your code and the libraries you use, but also encoding and transmission standards like M-JPEG, DVC, MPEG, DVB, ATSC), we generally just store and transmit the picture. We don't include all those extra lines or extra stuff on each line.

Typically, we'll have buffers in memory that either contain one field or one frame. Those fields or frames may be uncompressed or they may be compressed with M-JPEG, MPEG, DVC, or other algorithms.

Even though they're "just pictures," by now it should be clear that we still have to think about how our pixels in memory map onto the video system for which they are intended, because in order to process video data correctly:

We have to figure this out on a case-by-case basis.

Here are some hints for different situations:

The 480i/60M Video System

Vital Statistics

480i/60M Vital Statistics
Scanning SystemField-based (2:1 interleaved)
Rate60M fields per second
Picture Aspect Ratio4:3
Total Lines Per Frame525
Analog Electrical Standards SMPTE 170M-1994 (NTSC component and composite)
S-video based on NTSC.
Digital Electrical Standards ITU-R BT.601-4 (basic parameters of digital image)
SMPTE 125M-1995 (parallel digital)
SMPTE 259M-1997 (serial digital, aka SDI)
Picture CenterVertical halfway between line 404 (field 2) and line 142 (field 1)
Horizontal analog: 481.5 luma sample periods from 0H
digital: halfway between luma sample 359 and 360 of digital active line, which is halfway between luma sample 481 and 482 from 0H
Definition of F1 and F2 ITU-R BT.601-4 (also known as Rec. 601 and formerly CCIR 601) defines an encoding scheme for digital video.

ANSI/SMPTE 170M-1994 defines Field 1, Field 2, Field 3, and Field 4 for NTSC (figure 7).

ANSI/SMPTE 125M-1992 defines the 525-line version of the bit-parallel digital Rec. 601 signal, using an NTSC waveform for reference. 125M defines Field 1 and Field 2 for the digital signal (figure 4).

ANSI/SMPTE 259M-1993 defines the 525-line version of the bit-serial digital Rec. 601 signal in terms of the bit-parallel signal.

F1 is defined as an instance of Field 1 or Field 3.

F2 is defined as an instance of Field 2 or Field 4.

Field WeaveSee sampling methods below

Sampling

As we mentioned above, in the 1990s, there was massive chaos because different vendors in the industry could not agree on which rectangle (which production aperture) to extract from the video signal.

By around year 2000, the following de facto industry conventions had emerged for square-pixel and non-square-pixel sampling, although you will certainly find lots of legacy data and devices which use a different rectangle.

Square-Pixel Sampling

480i/60M Square Pixel Sampling
Pixel Aspect Ratio1H : 1V (square)
Production Aperture
(software image)
Size 640x486 pixels

Heuristic: if you encounter 640x480 instead of 640x486, it likely, though not necessarily, begins on line 283 as the 640x486 aperture does in the diagram below (and is therefore not centered about the picture center). See above for more.

Horizontal 640 points, spaced at exactly 12 3/11 MHz (no M), centered around the system's horizontal picture center (see above).
Vertical (same as non-square sampling below)

Clean ApertureSize640x480 pixels
Positionco-centric with production aperture

Non-Square-Pixel Sampling

480i/60M Non-Square Pixel Sampling
Pixel Aspect Ratio10H : 11V (non-square: click here for more)
Production Aperture
(software image)
Size 720x486 pixels

Heuristic: if you encounter 720x480 instead of 720x486, it likely, though not necessarily, begins on line 283 as the 720x486 aperture does in the diagram below (and is therefore not centered about the picture center). See above for more.

Horizontal 720 points, spaced at exactly 13.5 MHz (no M), centered around the system's horizontal picture center (see above). These are the same 720 active video points defined in ITU-R BT.601-4 ("Rec. 601").
Vertical (same as square-pixel sampling above)

Clean ApertureSize (640*(11/10))x480 pixels (explanation)
Positionco-centric with production aperture

The 576i/50 Video System

Vital Statistics

576i/50 Vital Statistics
Scanning SystemField-based (2:1 interleaved)
Rate50 fields per second
Picture Aspect Ratio4:3
Total Lines Per Frame625
Analog Electrical Standards ITU-R BT.470-3 (PAL component and composite)
S-video based on PAL.
Digital Electrical Standards ITU-R BT.601-4 (basic parameters of digital image)
ITU-R BT.656-2 (parallel and serial digital, aka SDI)
Picture CenterVertical halfway between line 479 (field 2) and 167 (field 1)
Horizontal analog: 491.5 luma sample periods from 0H
digital: halfway between luma sample 359 and 360 of digital active line, which is halfway between luma sample 491 and 492 from 0H
Definition of F1 and F2 ITU-R BT.601-4 (also known as Rec. 601 and formerly CCIR 601) defines an encoding scheme for digital video.

ITU-R BT.470-3 (formerly known as CCIR Report 624-1) defines "first field" (F1) and "second field" (F2) (figure 2) for 625-line PAL.

ITU-R BT.656-2 describes a 625-line version of the bit-serial and bit-parallel Rec. 601 digital video signal. It defines Field 1 (F1) and Field 2 (F2) for that signal (table I).

Field WeaveSee sampling methods below

Sampling

As we mentioned above, in the 1990s, there was massive chaos because different vendors in the industry could not agree on which rectangle (which production aperture) to extract from the video signal.

By around year 2000, the following de facto industry conventions had emerged for square-pixel and non-square-pixel sampling, although you will certainly find lots of legacy data and devices which use a different rectangle.

Square-Pixel Sampling

576i/50 Square-Pixel Sampling
Pixel Aspect Ratio1H : 1V (square)
Production Aperture
(software image)
Size768x576 pixels
Horizontal 768 points, spaced at 14.75 MHz, centered around the system's horizontal picture center (see above).
Vertical (same as non-square sampling below)

Clean ApertureSize768x576 pixels
Positionco-centric with production aperture
(identical in this case)

Non-Square-Pixel Sampling

576i/50 Non-Square-Pixel Sampling
Pixel Aspect Ratio59H : 54V (non-square: click here for more)
Production Aperture
(software image)
Size720x576 pixels
Horizontal 720 points, spaced at 13.5 MHz, centered around the system's horizontal picture center (see above). These are the same 720 active video points defined in ITU-R BT.601-4 ("Rec. 601").
Vertical (same as square-pixel sampling above)

Clean ApertureSize (768*(54/59))x576 pixels (explanation)
Positionco-centric with production aperture

The 1080i/60M and 1080i/50 Video Systems

Vital Statistics

1080i/60M and 1080i/50 Vital Statistics
Scanning SystemField-based (2:1 interleaved)
Rate 1080i/60M: 60M fields per second
1080i/50: 50 fields per second
Picture Aspect Ratio16:9
Total Lines Per Frame1125
Analog Electrical Standards SMPTE 274M-1995 (component analog HD)
Digital Electrical Standards SMPTE 274M-1995 (HD signal structure)
SMPTE 292M-2006 (HD serial digital aka HD-SDI)
Picture CenterVertical halfway between line 291 (field 1) and 853 (field 2)
Horizontal analog: 1151.5 luma sample periods from 0H
digital: halfway between luma sample 959 and 960 of digital active line, which is halfway between luma sample 1151 and 1152 from 0H
Definition of F1 and F2 SMPTE 274M-1995 defines "first field" (F1) and "second field" (F2) for the analog and serial digital signals (clause 6.3).
Field WeaveSee sampling methods below

Sampling

Blissfully, 1080i has square pixels and no pixel debacles, so there's just one clear way to sample it:

1080i/60M and 1080i/50 Sampling
Pixel Aspect Ratio1H : 1V (square)
Production Aperture
(software image)
Size1920x1080 pixels
Horizontal 1920 points, spaced at:
  • 1080i/60M: 74.25M MHz
  • 1080i/50: 74.25 MHz
centered around the system's horizontal picture center (see above).
Vertical
Clean ApertureSize1888x1062 pixels
Positionco-centric with production aperture

The 720p/60M and 720p/50 Video Systems

Vital Statistics

720p/60M and 720p/50 Vital Statistics
Scanning SystemFrame-based (progressive scan)
Rate 720p/60M: 60M frames per second
720p/50: 50 frames per second
Picture Aspect Ratio16:9
Total Lines Per Frame750
Analog Electrical Standards SMPTE 296M-1995 (component analog HD)
Digital Electrical Standards SMPTE 296M-1995 (HD signal structure)
SMPTE 292M-2006 (HD serial digital aka HD-SDI)
Picture CenterVertical halfway between line 385 and 386
Horizontal analog: 899.5 luma sample periods from 0H
digital: halfway between luma sample 639 and 640 of digital active line, which is halfway between luma sample 899 and 900 from 0H
Definition of F1 and F2frame-based system: there are no fields
Field Weaveframe-based system: there are no fields

Sampling

Blissfully, 720p has square pixels and no pixel debacles, so there's just one clear way to sample it:

720p/60M and 720p/50 Sampling
Pixel Aspect Ratio1H : 1V (square)
Production Aperture
(software image)
Size1280x720 pixels
Horizontal 1280 points, spaced at
  • 1080i/60M: 74.25M MHz
  • 1080i/50: 74.25 MHz
centered around the system's horizontal picture center (see above).
Vertical
Clean ApertureSize1248x702 pixels
Positionco-centric with production aperture

Support
This Site
This hobby site is supported by readers like you. To guarantee future updates, please support the site in one of these ways:
donate now   Donate Now
Use your credit card or PayPal to donate in support of the site.

get anything at all from amazon.com
Use this link to Amazon—you pay the same, this site gets 4% from Amazon.
get a cool thai-english iPhone dictionary app
Learn Thai with my Talking Thai-English-Thai Dictionary for iPhone/iPad/iPod Touch.
get a cool thai-english Windows dictionary
Learn Thai with my Talking Thai-English-Thai Dictionary for Windows PCs.
get a cool thai-english paper dictionary
Don't leave home without the Thai-English English-Thai Compact Dictionary I co-authored.
get thailand fever
I co-authored this bilingual cultural guidebook to Thai-Western romantic relationships.
CopyrightAll text and images copyright 1999-2011 Chris Pirazzi unless otherwise indicated.