| DVD subtitles | |
| --------------- | |
| 0. Introduction | |
| 1. Basics | |
| 2. The data structure | |
| 3. Reading the control header | |
| 4. Decoding the graphics | |
| 5. What I do not know yet / What I need | |
| 6. Thanks | |
| 7. Changes | |
| The latest version of this document can be found here: | |
| http://www.via.ecp.fr/~sam/doc/dvd/ | |
| 0. Introduction | |
| One of the last things we missed in DVD decoding under my system was the | |
| decoding of subtitles. I found no information on the web or Usenet about them, | |
| apart from a few words on them being run-length encoded in the DVD FAQ. | |
| So we decided to reverse-engineer their format (it's completely legal in | |
| France, since we did it on interoperability purposes), and managed to get | |
| almost all of it. | |
| 1. Basics | |
| DVD subtitles are hidden in private PS packets (0x000001ba), just like AC3 | |
| streams are. | |
| Within the PS packet, there are PES packets, and like AC3, the header for the | |
| ones containing subtitles have a 0x000001bd header. | |
| As for AC3, where there's an ID like (0x80 + x), there's a subtitle ID equal | |
| to (0x20 + x), where x is the subtitle ID. Thus there seems to be only | |
| 16 possible different subtitles on a DVD (my Taxi Driver copy has 16). | |
| I'll suppose you know how to extract AC3 from a DVD, and jump to the | |
| interesting part of this documentation. Anyway you're unlikely to have | |
| understood what I said without already being familiar with MPEG2. | |
| 2. The data structure | |
| A subtitle packet, after its parts have been collected and appended, looks | |
| like this : | |
| +----------------------------------------------------------+ | |
| | | | |
| | 0 2 size | | |
| | +----+------------------------+-----------------+ | | |
| | |size| data packet | control | | | |
| | +----+------------------------+-----------------+ | | |
| | | | |
| | a subtitle packet | | |
| | | | |
| +----------------------------------------------------------+ | |
| size is a 2 bytes word, and data packet and control may have any size. | |
| Here is the structure of the data packet : | |
| +----------------------------------------------------------+ | |
| | | | |
| | 2 4 S0+2 | | |
| | +----+------------------------------------------+ | | |
| | | S0 | data | | | |
| | +----+------------------------------------------+ | | |
| | | | |
| | the data packet | | |
| | | | |
| +----------------------------------------------------------+ | |
| S0, the data packet size, is a 2 bytes word. | |
| Finally, here's the structure of the control packet : | |
| +----------------------------------------------------------+ | |
| | | | |
| | S0+2 S0+4 S1 size | | |
| | +----+---------+---------+--+---------+--+---------+ | | |
| | | S1 |ctrl seq |ctrl seq |..|ctrl seq |ff| end seq | | | |
| | +----+---------+---------+--+---------+--+---------+ | | |
| | | | |
| | the control packet | | |
| | | | |
| +----------------------------------------------------------+ | |
| To summarize : | |
| - S1, at offset S0+2, the position of the end sequence | |
| - several control sequences | |
| - the 'ff' byte | |
| - the end sequence | |
| 3. Reading the control header | |
| The first thing to read is the control sequences. There are several | |
| types of them, and each type is determined by its first byte. As far | |
| as I know, each type has a fixed length. | |
| * type 0x01 : '01' - 1 byte | |
| it seems to be an empty control sequence. | |
| * type 0x03 : '03wxyz' - 3 bytes | |
| this one has the palette information ; it basically says 'encoded color 0 | |
| is the wth color of the palette, encoded color 1 is the xth color, aso. | |
| * type 0x04 : '04wxyz' - 3 bytes | |
| I *think* this is the alpha channel information ; I only saw values of 0 or f | |
| for those nibbles, so I can't really be sure, but it seems plausable. | |
| * type 0x05 : '05xxxXXXyyyYYY' - 7 bytes | |
| the coordinates of the subtitle on the screen : | |
| xxx is the first column of the subtitle | |
| XXX is the last column of the subtitle | |
| yyy is the first line of the subtitle | |
| YYY is the last line of the subtitle | |
| thus the subtitle's size is (XXX-xxx+1) x (YYY-yyy+1) | |
| * type 0x06 : '06xxxxyyyy' - 5 bytes | |
| xxxx is the position of the first graphic line, and yyyy is the position of | |
| the second one (the graphics are interlaced, so it helps a lot :p) | |
| The end sequence has this structure: | |
| xxxx yyyy 02 ff (ff) | |
| it ends with 'ff' or 'ffff', to make the whole packet have an even length. | |
| FIXME: I absolutely don't know what xxxx is. I suppose it may be some date | |
| information since I found it nowhere else, but I can't be sure. | |
| yyyy is equal to S1 (see picture). | |
| Example of a control header : | |
| ---- | |
| 0A 0C 01 03 02 31 04 0F F0 05 00 02 CF 00 22 3E 06 00 06 04 E9 FF 00 93 0A 0C 02 FF | |
| ---- | |
| Let's decode it. First of all, S1 = 0x0a0c. | |
| The control sequences are : | |
| 01 | |
| Nothing to say about this one | |
| 03 02 31 | |
| Color 0 is 0, color 1 is 2, color 2 is 3, and color 3 is 1. | |
| 04 0F F0 | |
| Colors 0 and 3 are transparent, and colors 2 and 3 are opaque (not sure of this one) | |
| 05 00 02 CF 00 22 3E | |
| The first column is 0x000, the last one is 0x2cf, the first line is 0x002, and | |
| the last line is 0x23e. Thus the subtitle's size is 0x2d0 x 0x23d. | |
| 06 00 06 04 E9 | |
| The first encoded image starts at offset 0x006, and the second one starts at 0x04e9. | |
| And the end sequence is : | |
| 00 93 0A 0C 02 FF | |
| Which means... well, not many things now. We can at least verify that S1 (0x0a0c) is | |
| there. | |
| 4. Decoding the graphics | |
| The graphics are rather easy to decode (at least, when you know how to do it - it | |
| took us one whole week to figure out what the encoding was :p). | |
| The picture is interlaced, for instance for a 40 lines picture : | |
| line 0 ---------------#---------- | |
| line 2 ------#------------------- | |
| ... | |
| line 38 ------------#------------- | |
| line 1 ------------------#------- | |
| line 3 --------#----------------- | |
| ... | |
| line 39 -------------#------------ | |
| When decoding you should get: | |
| line 0 ---------------#---------- | |
| line 1 ------------------#------- | |
| line 2 ------#------------------- | |
| line 3 --------#----------------- | |
| ... | |
| line 38 ------------#------------- | |
| line 39 -------------#------------ | |
| Computers with weak processors could choose only to decode even lines | |
| in order to gain some time, for instance. | |
| The encoding is run-length encoded, with the following alphabet: | |
| 0xf | |
| 0xe | |
| 0xd | |
| 0xc | |
| 0xb | |
| 0xa | |
| 0x9 | |
| 0x8 | |
| 0x7 | |
| 0x6 | |
| 0x5 | |
| 0x4 | |
| 0x3- | |
| 0x2- | |
| 0x1- | |
| 0x0f- | |
| 0x0e- | |
| 0x0d- | |
| 0x0c- | |
| 0x0b- | |
| 0x0a- | |
| 0x09- | |
| 0x08- | |
| 0x07- | |
| 0x06- | |
| 0x05- | |
| 0x04- | |
| 0x03-- | |
| 0x02-- | |
| 0x01-- | |
| 0x0000 | |
| '-' stands for any other nibble. Once a sequence X of this alphabet has | |
| been read, the pixels can be displayed : (X >> 2) is the number of pixels | |
| to display, and (X & 0x3) is the color of the pixel. | |
| For instance, 0x23 means "8 pixels of color 3". | |
| "0000" has a special meaning : it's a carriage return. The decoder should | |
| do a carriage return when reaching the end of the line, or when encountering | |
| this "0000" sequence. When doing a carriage return, the parser should be | |
| reset to the next even position (it cannot be nibble-aligned at the start | |
| of a line). | |
| After a carriage return, the parser should read a line on the other | |
| interlaced picture, and swap like this after each carriage return. | |
| Perhaps I don't explain this very well, so you'd better have a look at | |
| the enclosed source. | |
| 5. What I do not know yet / What I need | |
| I don't know what's in the end sequence yet. | |
| Also, I don't know exactly when to display subtitles, and when to remove them. | |
| I don't know if there are other types of control sequences (in my programs I consider | |
| 0xff as a control sequence type, as well as 0x02. I don't know if it's correct or not, | |
| so please comment on this). | |
| I don't know what the "official" color palette is. | |
| I don't know how to handle transparency information. | |
| I don't know if this document is generic enough. | |
| So what I need is you : | |
| - if you can, patch this document or my programs to fix strange behaviour with your subtitles. | |
| - send me your subtitles (there's a program to extract them enclosed) ; the first 10 KB | |
| of subtitles in a VOB should be enough, but it would be cool if you sent me one subtitle | |
| file per language. | |
| 6. Thanks | |
| Thanks to Michel Lespinasse <walken@via.ecp.fr> for his great help on understanding | |
| the RLE stuff, and for all the ideas he had. | |
| Thanks to mass (David Waite) and taaz (David I. Lehn) from irc at | |
| openprojects.net for sending me their subtitles. | |
| 7. Changes | |
| 20000116: added the 'changes' section. | |
| 20000116: added David Waite's and David I. Lehn's name. | |
| 20000116: changed "x0" and "x1" to "S0" and "S1" to make it less confusing. | |
| -- | |
| Paris, January 16th 2000 | |
| Samuel Hocevar <sam@via.ecp.fr> |