Monthly Archives: November 2010

Comparing Digital Video Downloads of Interlaced TV Shows

In the days of CRT monitors, TV shows used to be broadcast in interlaced mode, which is unsupported by modern flat-panel displays. All online streaming services and video stores provide progressive video, so they must deinterlace the data first. This article compares the deinterlacing strategies of Apple iTunes, Netflix, Microsoft Zune, Amazon VoD and Hulu by comparing their respective encodings of a Futurama episode.

If you have dealt with video formats before, you probably know about interlacing, a 1930s trick to achieve both high spatial and temporal resolution at half the (analog) data rate: In NTSC countries, there are 60 fields per second (PAL: 50), and every field is half the vertical resolution of a full frame. When film footage at 24 frames per second has to be played at 30 fps (NTSC), every frame has to be shown 1.25 times – in other words, every fourth frame has to be shown twice. This introduces jerky motion (judder), but it can be improved by using the 60 Hz temporal resolution: Frame A gets shown for 2 fields, frame B for 3 fields, frame C for 2 fields, and so on. This way, every source frame gets shown for 2.5 fields, i.e. 1.25 frames – this method is called a telecine 2:3 pulldown.

A lot of TV material is produced at 24 fps and telecined, for several reasons: Standard movie cameras can be used instead of TV cameras, 24 fps can be converted to 25 fps PAL more easily than 30 fps NTSC, and for cartoons, this means that only 24 (or 12) frames have to be drawn for every second.

Unfortunately, interlacing only works with ancient CRT TVs – modern LCD screens can only show progressive video. And while DVDs are specified to encode interlaced video, more modern formats like MPEG-4/H.264 and VC-1 usually carry progressive data. So when playing DVDs, the DVD player or the TV have to deal with the interlacing problem, and in case of modern file formats, it’s the job of the converter/encoder.

The naive way of converting an interlaced source to progressive is to combine every two fields into a frame. This works great if the original source material was 30 fps progressive (which is rare for NTSC but common for PAL), but for telecined video, since two out of every six frames are combined from two different fields, this leads to ugly combing effects.

If the source material was 24 fps, an inverse telecine can be done, recovering the original 24 frames per second. Unfortunately, it is not always this easy, since interlaced video may switch between methods, and sometimes use different methods at the same time, e.g. overlaying 30 fps interlaced captions on top of a 24 fps telecined picture, or compositing two telecined streams with a different phase. “Star Trek: The Next Generation” is a famous offender in this category – just single-step through the title…

In the following paragraphs, let us look at an episode of Futurama and how the deinterlacing was done by the different providers of the show. Futurama was produced in 24 fps and telecined. Some of the editing seems to have been done on the resulting interlaced video, so the telecine pattern is not 100% consistent.

NTSC DVD

The NTSC DVD is basically just an MPEG-2-compressed version of the US CCIR 601 broadcast master. It encodes 720×480 anamorphic pixels (which can be displayed as 640×480 or 720×540) and has all the original interlacing intact. This is a frame at 640×480 and properly inverse telecined:

DVD

Hulu

Hulu

Hulu (480p version) took the original image without doing any cropping on the sides. You can clearly see this picture is only half the vertical resolution, meaning one of the fields got discarded. It seems this was Hulu’s deinterlacing strategy, since throughout the complete video, everything is half the vertical resolution, whether there is motion or not. This also keeps the video at 30 fps, and effectively shows every fourth frame twice, introducing stronger judder.

iTunes

iTunes

iTunes crops the picture to get rid of the black pixels in the overscan area and scales it to 640×480. They run a full-blown 60 Hz deinterlace filter on the video. Such a filter is meant to take a live television signal as an input, with a temporal resolution of 60 Hz. While this looks fine on frames with no or little motion, vertical resolution is halved as soon as there is motion. Basically, it is the wrong filter. Like Hulu, iTunes preserves the 30 fps, introducing a stronger judder. (The video encoding is H.264 at 1500 kbit/sec.)

Netflix

Netflix

Netflix seems to do the same as iTunes – maybe they even got the data from iTunes? The image is cropped and scaled to 640×480, they run a deinterlace-filter and retain the 30 fps, leading to halved resolution when there is motion, and stronger judder.

Amazon Video on Demand

Amazon Video on Demand

Amazon Video on Demand with its horribly inconvenient Unbox Player (Windows only, requires 1 GB of extra downloads and two reboots) did a better job. Like Netflix and iTunes, they cropped the picture and scaled it to 640×480, but they actually did a real inverse telecine. In some segments (like the end credits), the algorithm failed because of inconsistencies of the original telecine, so it reverted to half the vertical resolution. And like the others, Amazon also encodes at 30 fps, i.e. judder. (The video encoding is VC-1 at 2600 kbit/sec.)

Zune

Zune

Microsoft’s Zune Store provides a cropped video at 640×480 at the original 24 fps and with a bitrate of 1500 kbit/sec (VC-1). Looking through it frame by frame reveals that they used a brilliant detelecine/deinterlace algorithm. On the DVD, the panning at the beginning of the “Robot Hell” song is very tricky: It breaks the standard telecine pattern (PPPIIPPPII becomes PPPIPPPI), it seems every fifth frame was removed.




The pan consists of a pattern of three progressive frames, and then one interlaced frame, which is composed of the previous frame and the current frame. Consequently, every fourth frame has half its resolution wasted by the repeated lines of the previous frame, i.e. every fourth frame only exists at half resolution in the DVD master material.

Hulu discards half the vertical resolution for every frame anyway, and the deinterlacing algorithms of iTunes and Netflix discard half the resolution whenever there is motion. The Amazon algorithm does a good job when the telecine pattern is correct, but in this case, it gets confused and encodes all frames of the pan in half resolution. The Zune algorithm does a brilliant job here: The progressive frames stay at full resolution, and it extracts the half-resolution picture out of every fourth frame:




This is the fourth picture at full size – you can see half the vertical resolution is missing (it was never there in the first place!), but the algorithm did a very good interpolation job:

Robot Hell (Zune)

The Zune video is almost perfect. It recombines all fields correctly and recovers all single fields, scaling them up so that it’s hardly visible there is information missing. If you ignore the 720 vs. 640 horizontal pixels, the resulting 24 fps video contains all information of the DVD version, but with all interlacing removed, and with zero judder. Too bad it’s not H.264, but DRMed and only plays on Windows (XP+), Zune and Windows Phone 7.

Summary

Provider Cropping Resolution Deinterlacing fps Encoder Bitrate (kbits/sec)
NTSC DVD no 720×480 none 30 MPEG-2 6500
Hulu no 640×480 discard 30 H.264? ?
iTunes yes 640×480 30 Hz deinterlace 30 H.264 1500
Netflix yes 640×480 30 Hz deinterlace 30 H.264/VC-1 ?
Amazon VoD yes 640×480 detelecine+decomb 30 VC-1 2600
Zune yes 640×480 fuzzy detelecine 24 VC-1 1500

Note: H.264 and VC-1 compress significantly better than MPEG-2; a rule of thumb is to divide the MPEG-2 bitrate by 2.3 to get a comparable H.264/VC-1 bitrate. So the Amazon bitrate is fine and the video is about the same quality (sharp picture, no compression artefacts) as the DVD, but the iTunes and Zune versions are not (artifacts can be seen on single frames).

It is scary how little effort seems to be going into video conversion/encoding at major players like iTunes, Netflix and Hulu. Amazon did a kind of okay job converting the source material properly, and only Microsoft did an excellent job. The NTSC DVDs still give you the maximum quality – but of course, if you watch them on an LCD, the burden of deinterlacing is on your side. Handbrake with “detelecine” (for the bulk of it) and “decomb” (for exceptions) turned on, and with a target framerate of “same as source” will generate a rather good MP4 video similar to Amazon’s, but without the judder.

Are there any stores I missed? Can someone check the PAL DVD as well as digital PAL and NTSC broadcasts? What is the magical detelecine/deinterlace program Microsoft uses?

See also: Comparing Bittorrent Files of Interlaced TV Shows

Xbox Serial Number Statistics

Slashdot had a story recently on how in 1942, the allies were able to estimate the number of German taks produced based on the serial numbers of the tanks. In 2010, a German hacker is doing the exact same thing with Xboxes. This article describes the generic approach, shows some results, and provides previously unreleased raw data of 14,000 Xbox serials so you can do your own statistics!

Between October 2003 and January 2005, the Xbox Linux Project asked all visitors to their website to enter their Xbox serial numbers, date and country of manufacture, ROM version, hard disk and DVD drive brand and other properties, and gathered more than 14,000 entries. The original idea was to find a rule to deduce the hard disk and DVD drive types in an Xbox by only looking at the serial number, which was visible through the unopened packaging.

The serial sticker on an Xbox looks like this:

MFG. DATE  2002-03-03
SERIAL NO. 1166356 20903

After looking at several serial numbers, it was already clear that the last two digits (“03″ in my example) are the location of manufacture: 02 is Mexico, 03 is Hungary, 05 is China and 06 is Taiwan. The three digits before (“209″ in my example) are the one-digit year (“2″ for “2002″) and the two-digit calender week (“09″ for around the first week of March).

Now we want to find out how many devices were manufactured. A first approximation is to look at the manufacturing dates of all Xboxes in our database.

This gives us an idea when production was ramped up (in 2001 and 2002 in November, and in 2003 in August, September and October), but the statistics don’t give us absolute numbers, and they are biased towards older devices (newer devices are not entered yet, and visitors of our site tend to be early adopters).

But what about these first seven digits of the serial number? Shouldn’t these be actual “serial” numbers? Let’s look at all devices from August 2003 and sort the first seven digits by manufacturing date:

This does not look like a serial number. But all numbers are > 1,000,000, which implies that the first digit has a special meaning and is not part of the number. Let’s look at distribution of the first digit:

The first digit seems to be the number of the assembly line in the factory! So let’s look at the remaining 6 digits again:

This looks a lot better! But there are several things interleaved in this chart – because the serial numbers are of course counted independently in every factory. If we filter just all numbers form the Chinese factory, we get this:

We can see serial numbers are counted up every week, but we still see all assembly lines interleaved here, and the different lines don’t reset at the same time. Here is line 6 all by itself:

Looks almost perfect, if we assume the wild shots are caused by typos. Here is a manually fixed version of it:

Voilà! Serial numbers that count up monotonically and get reset on every Sunday.

By inspection of the graph, we can estimate that assembly line 6 of the factory in China produced about 275,000 devices per week in week 33 (mid August) of 2003. This works well, because we have so many samples; but for other weeks, we have as few as five. This is the formula for the German Tank Problem:

k is the sample size and m s the highest serial number observed.

The estimate of Xboxes produced by assembly line 6 in China in week 33 of 2003 is therefore 285,269. Applying this to every assembly line of every factory and every week, it should be easy to get great statistics on the productivity of the different lines and factories, as well as a very good estimate of the total number of devices produced. …and this is where you come in!

The Data

You want to do your own statistics? Here is the raw data:

xbox_serials.csv (2.5 MB)

It is a comma-separated-value file with the following columns:

Column Example Description Comment
1 2002-03-03 Manufacture Date YYYY-MM-DD
2 2002-04-30 Date of Purchace YYYY-MM-DD
3 de Country of Purchase two-digit code
4 1166356 20903 Serial Number nnnnnnn nnnnn
5 v1.0 Xbox Version motherboard revision
6 3944 Kernel Version ROM version as shown in “About” dialog
7 4034 Dashboard Version HD software version as shown in “About” dialog
8 Unknown/Other Flash what’s printed on flash ROM chip
9 PAL Video Standard PAL or NTSC
10 Black Case Color Xboxes are black, but there are some special editions
11 Thomson DVD Drive Philips, Samsung, Thomson
12 Seagate 10 GB Hard Disk Seagate or Western Digital
13 Conexant Video Encoder Brand Conexant, Focus, Xcalibur
14 Golden Xbox comments free-form field

Please note that people were able to fill some fields with arbitraty data, so they might not necessarily be in exactly the specified form. There are also lots of typos in the serial numbers and the month and day fields in the data fields have been mixed up sometimes. You probably want to run a script over the data first that sanitizes some of the input, e.g. removes dashes and spaces from serial numbers etc.

Here are some ideas on what you might want to find out:

  1. Is there a better formula to estimate the number of Xboxes produced per week on a certain assembly line?
  2. What day does a week start with? Does the factory produce Xboxes on Sundays? Do they produce just as many? Is it different in the respective countries?
  3. How many Xboxes were produced per assembly line, per week and per factory?
  4. Are all assembly lines in a certain factory just as productive?
  5. Are all factories just as productive (per assembly line)?
  6. Did productivity go up over time? Did it hit a maximum?
  7. How many Xboxes were produced total?
  8. Does an assembly line in a certain factory use all the same flash chips, hard drives and DVD drives in a certain week?
  9. When did an assembly line in a certan factory switch between board revisions?
  10. How long does it take an assembly line to be reconfigured for a different board revision?
  11. When did factories open/close? When did assembly lines get created and torn down in certain factories? Is there a correlation? Did assembly lines get migrated between factories? How long does this take?
  12. How long does it take on average for an Xbox from manufacuring to when it’s bought, per country? Does it change over the years?
  13. Which factories serve which countries? Did it change?
  14. How do ROM version, HD software version, motherboard version and video encoder brand correlate to each other?
  15. Which countries have PAL, which have NTSC?
  16. Where were the non-black Xboxes made?
  17. What percentage of Xboxes has a Philips, a Samsung or a Thomson DVD drive?
  18. What is the distribution of hard drive types?
  19. Some people claim they have a 20 GB hard drive. How credible is this?
  20. When and at which factories were certain DVD and HD types introduced?
  21. Over time, how did the distribution of DVD and HD types change?
  22. What is the distribution of flash chips, how did it change, and how does it correlate to factories?
  23. Is there enough data to make statements about the refurbishment process (search for “refurb” in comments)?
  24. What percentage of people misses a digit when trying to type in 12 digits?
  25. What percentage of people replaced digits of the serial number with an ‘X’ or a ‘*’? What percentage of these chose the right digits to properly anonymize their serial numbers?
  26. Any more interesting observations you can come up with?

Please share your ideas as well as your results (plus source code of your scripts, please)! If you know any statistics teachers looking for a large real-world data set and an interesting set of problems, feel free to refer them to this site! :-)

The Intel 80376 – a Legacy-Free i386 (with a Twist!)

25 years after the introduction of the 32 bit Intel i386 CPU, all Intel compatibles still start up (and wake up!) in 16 bit stone-age mode, and they have to be switched into 32/64 bit mode to be usable.

Wouldn’t it be nice if a modern i386/x86_64 CPU started at least in 32 bit protected mode? Can’t they make a legacy-free CPU that does not support 16 bit mode at all? Such a CPU exists, well, existed. It’s the 1989-2001 Intel 80376, an embedded version of the Intel i386.

The datasheet describes all the interesting differences. The 80376 does not support any 16 bit mode, so the “D” bit in segment descriptors must be set to 1 (page 25), forcing 32 bit code and data segments. 286-style descriptors are not supported either (page 27). (The 0×66 and 0×67 opcode prefixes still exists, so code can work on 16 bit registers and generate 16 bit addresses (page 14), just like an i386 in 32 bit mode.)

Since the CPU does not support 16 bit modes, it cannot do real mode, so CR0.PE is always 1. Consequently, a 80376 starts up in 32 bit protected mode, but otherwise, startup is just like on the i386 (page 19): EIP is 0x0000FFF0, CS is 0xF000, CS.BASE is 0xFFFF0000, CS.LIMIT is 0xFFFF, and the other segment registers are 0×0000, with a base of 0×00000000 and a limit of 0xFFFF. No GDT is set up, and in order to get the system into a sane state, loading a GDT and reloading the segment registers is still necessary. Too bad they didn’t set all bases to 0, all limits to 0xFFFFFFFF and EIP to 0xFFFFFFF0.

The 80376 is designed to be forward-compatible with the i386, so unsupported features are documented as “reserved” or “must be 0/1″, and legacy properties like the garbled segment descriptors are unchanged. All (properly written) 80376 software should also run on an i386 (page 1) – except for the first few startup instructions of course. Intel provides the following code sequence (page 20) that is to be executed directly after RESET to distinguish between the 80376 and the i386:

smsw bx
test bl, 1
jnz is_80376

This tests for CR0.PE, which is hardcoded to 1 on the 80376 and is 0 on RESET on an i386. The three instructions are bitness agnostic, i.e. the encoding is identical in 16 and 32 bit mode.

Sounds like the perfect CPU? Well, here comes the catch: The 80376 doesn’t do paging. CR2 and CR3 don’t exist (it is undocumented whether accessing them causes an exception), CR0.PG is hardcoded to 0 (page 8) and the #PF exception does not exist (page 17). A man can dream though… a man can dream.

Emu8080: an HTML5 App to Emulate a Complete CP/M Machine

by Stefan Tramm

There are a lot of web-based computer emulators out there for ancient machines – the Commodore 64 and MSX home computers or the more exotic Space Invaders arcade machines. Such emulators are nice to play with but they are incomplete – they lack an Input/Output subsystem.

Emu8080 is the first known Javascript-based emulator which adds a floppy disk system, a fast paper tape reader/writer, a line printer and a VT100 terminal emulator to the mix. Modern web browsers offer a local WebSQL database, which is used to implement block storage devices. Drag-and-Drop is used to mount a desktop file onto the virtual tape device. Together with CPU and RAM emulation, you have a complete machine inside your web browser…

Want to try out? Use Chrome6+ (or Safari5) and visit http://www.tramm.li/i8080

Now you are working inside a classical machine monitor for an 8080 microcomputer from the 70′s. Available commands are:

  • m to modify memory
  • d to dump memory on the screen
  • l to disassemble a program (in Z80 Syntax)
  • g to start execution
  • <Enter> to perform a single step
  • x to show and modify CPU registers
  • h for help

So far this is pretty normal machine emulation.

The first important command is “r”. Without any parameters, it lists the available files on the tramm.li server. “r file” reads an Intel HEX file into memory, for example “r basic” will load the well-known Microsoft 8K BASIC interpreter for the Intel 8080.

But as already mentioned, a disk subsystem is also available. The first step in using it is to either insert a pre-formatted disk or to format an empty disk. To insert a pre-formatted CP/M disk, issue a “r 0 cpma” command.

This command expects the image of an 8-inch IBM formatted SS/SD diskette and writes it sector-wise into the browser-local WebSQL database. This needs to be done only once, causing the disk to now be loaded into your browser as Emu8080 drive 0. With the “b” command, the boot sector is loaded into “main memory”, starting at address 0. Now issue a “g” command to start execution. Ta da!, CP/M 2.2 has been loaded and started!

To go back to the monitor, you have two choices: Ctrl-. or via the “bye” executable.

Emu8080 supports four drives, so you can either format the other drives (“F” command) or search for other images on the internet and load them yourself. But how do we mount these drives? The “dsk” commands comes to rescue. It opens a new window, showing the current status of each drive. Every disk is also a drop-area, so you can mount disk images per drag-and-drop from your desktop into the emulator. Technically the FileReader API of Javascript is used, and this API is currently (November 2010) implemented only by Chrome (or Opera). Yes, you do not need the server anymore to transport data into a Javascript application.

Now you can mount disks and boot CP/M. You can edit files with ED.COM or Wordmaster, or write programs in 8080 assembler. But how to get the modified disk images out of the emulator again? The “dsk [n]” commands opens a new window, which allows to post the binary disk image to an arbitrary web server, e.g. localhost. (btw: there is no FileWriter API in HTML5). Just change the target URL and the form will post it where ever you want.

Sometimes mounting and posting whole images is overkill, transmitting a single file into the CP/M machine or getting it out may be all you want to do. And here comes the emulated paper tape device to rescue. Just drag a file onto the terminal’s main window and it will be mounted. Inside CP/M, just use the PIP command: “pip file.txt=rdr:” and you will have the tape’s content in FILE.TXT. Please keep in mind that CP/M is not 8-bit clean.

To get a single file out of the emulator, the paper tape puncher is used. Inside CP/M just use PIP again: “pip ptp:=rew.asm” to punch the file out. Then enter the monitor again (Ctrl-. or “bye”) and use the “ptp” command, which opens a window, where you either post the tape content to an arbitrary web server or simply copy-paste the content.

The final device is a line printer. It just opens a separate window where every character is shown. No escape sequences are implemented – therefore no bold or underlined characters.

In the unlikely case that the user has no clue about CP/M and its commands, the emulator includes a PDF of the original documentation by Digital Research.

Now, lets take a short look into the technical details:

  1. WebSQL: The emulator uses the browser local SQL DB as block storage for the four disk drives. The storage is persistent and writable.
  2. Drag-and-Drop / FileReader API: The emulator uses drag-and-drop and the FileReader API to read paper tape or disc images from your local machine.
  3. HTML5 manifest: The emulator supports the browser local database and browser local file reading. To make it also runnable without an active internet connection, it is built as a HTML 5 App by providing an emu8080.manifest file. This file tells the browser which resources (HTML, JS, CSS, PNG, HEX, CPM, PDF files) must be cached. This means that if you bookmark http://www.tramm.li/i8080/emu8080.html, you can start the emulator and access the disk drives even without an internet connection — just like a local application.
  4. ShellInABox: Without the fabulous VT100 emulation of ShellInABox, this emulator would not have been possible. The provided example of a BASIC interpreter was the starting point for the 8080 emulator.
  5. z80pack: The C based implementation of a Z80 machine was the basis for the IO-subsystem implementation, the custom BIOS, and the boot disk image. The trick for reducing host CPU cycles in tight IO-Loops (eg. when CP/M is waiting for a key press) was taken from this package.
  6. js8080: The Space Invaders emulator js8080 gifted the core CPU emulation.

One last word on performance. On a usual x86 processor the emulator performs like a 2 MHz 8080. With the monitor command “instr” you can tune the performance, by increasing the number of executed 8080 instructions per Javascript timeslice. With today’s hardware, you can expect the performance of a 8 MHz CPU (which was never available as real silicon).

And for Microsoft BASIC fans, try “r basic” and “g” afterwards instead of booting CP/M – welcome to 8K BASIC. Use “SAVE” to save your programs on the paper tape :-) and “LOAD” to read saved programs.

Neozeed provided the Zork adventure game disk image.

And now — happy hacking.

Other links:

HFS+ File System Analysis and Forensics with fileXray

Modern filesystems are highly optimized database systems that are a core function of modern operating systems. They allow concurrent access by many CPUs, they keep locality up and fragementation down, and they can recover from crashes guaranteeing consistent data structures.

If you want to learn about the internals of modern filesystems, you can either read up on theory (e.g. Dominic Giampaolo’s excellent free book Practical File System Design), or you can dissect one using a tool like fileXray, which

  • allows you to examine data structures on your Mac OS X HFS+ disk, like the volume header, the journal, regular and special files, etc., down to the B-tree level
  • gives you insight into statistics like fragmentation and hot file clustering
  • lets you find out where all these deleted files really go, and what forensic analysis can tell someone about your disks!