Errata list for Data Compression: The Complete Reference, 3rd Edition

Last Updated 23 Mar 2008 Error found by Cosmin Truta Page 347, line -14: "The third step uses the absolute values of the three region numbers Q_i (there are 365 of them, since one of the 729 values is zero) ..." should be changed to: "The third step maps the region numbers $Q_i$ to an integer $Q$ in the range $[0 \ldots 364]$. The tuple $(0,0,0)$ is mapped to 0, and the 728 remaining tuples are mapped to $[1 \ldots \frac{728}{2}=364]$, such that $(a,b,c)$ and $(-a,-b,-c)$ are mapped to the same value." [The original text mentions the use of absolute values in the third step which implies that a tuple like (-1,+2,-3) is mapped to the same value of $Q$ as (+1,+2,+3). This is incorrect. In fact, (-1,+2,-3) is mapped to the same value of $Q$ as (+1,-2,+3).] Page 606, figure 5.62. The box under 18 and 3 should be 2(1,1) 2(1,1) 1(1,1) -2(1,1) ------- Errors found by Stephan Wolf page 648, line 7: "With high-resolution and 60 frames per second the transmitter must be able to send 124,416,000 bits/sec (about 14.83 Mbyte/sec)" should be (considering that each pixel is 24 bits) "With high-resolution and 60 frames per second the transmitter must be able to send 2,985,984,000 bits/sec (about 356 Mbyte/sec)" Also, the six numbers in Table 6.7 on the same page, should be corrected to reflect the 24 bits per pixel. refresh rate total # lines x pixels of pixels 24 30 60 1080 x 1920 2,073,600 1,194,393,600 1,492,992,000 2,985,984,000 720 x 1280 921,600 530,841,600 663,552,000 1,327,104,000 Table 6.7: Resolutions and Capacities of Six DTV Formats. ------- Errors found by Yves Charton page 695, caption to figure 7.1. "PSL" should be "SPL". page 738, line 1: "II and II" should be "II and III". ------ Error found by Liyang HU Page 856, glossary item for Bitplane: "A bi-level image, for example, consists of two bitplanes." Should be "A bi-level image, for example, consists of one bitplane." ------ Error found by Frederic Kayser Page 92, paragraph 3: The number ``6'' in the example should be ``3''. (See explanation of the repeat count on page 21.) Thus, this paragraph should read: Example: Suppose that the character with ASCII code 52 repeats six times. Stage 1 will generate the four bytes ``52, 52, 52, 3,'' and stage 2 will replace each with a token, will increment the entry for ``52'' (entry 53 in the table) by 3, but will not increment the entry for ``3'' (which is entry 4 in the table). (The three tokens for the three bytes of ``52'' may all be different, since tokens may be swapped after each ``52'' is read and processed.) ------- 25 May 2005. Error found by Dan Pugh Page 314: I coded the matrices of the LLM method and found an error in one of the matrices. Matrix C3 should have its fourth row, seventh element changed from a one to a zero in order to simulate properly. If you are interested, I have Matlab code that verifies the correction, in the case that you decide for a fourth edition of this book. I have also noted that the result of the matrix product is off by 1/(sqrt(8)), which is probably discussed in the original LLM paper. ------- 25 June 2005. Error found by Jason Corless Page 230: Paragraph 4. "4-bit codes" should be "3-bit codes". Also "2, 4, 4, 4, 2, 4" should be "2, 3, 3, 3, 2, 3" ------- 3 Sept 2005. Errors found by Shashank Khanvilkar Page 228: line -11. "A length of 12 becomes edoc 266 followed by a single bit 1..." should be "A length of 12 becomes edoc 265 followed by a single bit 1..." Page 230: line 3. "The first example employs a set of six symbols A-E with probabilities..." should be "The first example employs a set of six symbols A-F with probabilities.. " Page 729: line 4. Reference [Pan and Hwang 96] should be [Pan 95]. Page 729: line -1. ".. Another resource is the association of audio engineers (AES) .." should be ".. Another resource is the audio engineering society (AES).." Page 733: Second last paragraph. "...(The matrix is a modified version of the well-known DCT, so it is called the MDCT matrix)".. This should be "...(The matrix is a variant of MDCT, which is a modified version of the well-known DCT algorithm)".. Page 739: Second paragraph. "..The 32 values q-1 are then written, as 4 bit numbers,..." should be "..The 32 (q-1) values are then written, as 4 bit numbers,..." Page 743: Equation for B_f It is given that B_f = bitrate/sampling rate. However this cannot be correct. Assume a bitrate of 128 Kbps and a sampling rate of 48Khz. In this case B_f will be ~2.67. The correct equation should be B_f = bitrate*(samples per frame)/sampling rate. here samples per frame = 384 for Layer -I and 1152 for layers II and III. Page 749: Figure 7.41 I think the figure is incorrect. The MDCT windows need to be overlapping. For example _______36_______ _______36_______ | | | _______36_______ | | ======================================================== | | | | | ---18--- ---18--- ---18--- ---18--- For simplicity i have shown all windows to be long. If both long and short windows are used, then special windowing function needs to be used that can transition from long windows to short windows and vice versa. Page 752: Second paragraph. "...This is followed by 59 bits of .." should be "...This is followed by 136 bits (for single channel) and 256 bits (for dual channel) of.." Size of side info for MPEG/Audio Layer III, for single and dual channel are listed below: --------------------------------------------------------------- Field Name | Single channel | Dual Channel | --------------------------------------------------------------- main_Data 9 9 private_bits= 5 3 scfsi: 4 8 part2_3length = 12x2 12x4 big_values = 9x2 9x4 global_gain = 8x2 8x4 scalefac_compress = 4x2 4x4 preflag: 1x2 1x4 scalefac_scale: 1x2 1x4 count1table_Select = 1x2 1x4 window_Switching_flag = 1x2 1x4 Additional info = 22x2 22x4 (dependent on window switching flag) Total 136 bits 256 ------- 3 Sept 2005. Error found by nitinshetty? Page 58: Par 1. "The values of P(n) for n = 1, 2, ..., 10 are 0.973, 0.0263, 0.007, .........." The above values are wrong. The values of p and q have been exchanged. Given : P(n) = q^(n-1) * p p = 0.027027 q = 0.972973 Hence, for n = 1, P(n) = p = 0.027027 ------- 24 Oct 2005. Error found by Boris Tripolski Page 168: Section 3.2 (Simple Dictionary Compression) states that in the worst case the compression ratio achieved is a least 0.9375 (besides the 257 Bytes for the dictionary). This is wrong though, because this would first of all mean that any file greater than 4kb, even a totally random one, could be compressed. As you will easily see, this is impossible. The error in the calculations resides in the probability for the codes you used. You have said that the probability of the codes with the length of 4 bit is the same as the one for 11 bit. Unfortunately there are only 2 codes with 4 bits, while there exist 128 codes with the length of 11. So the right compression ratio for the worst case scenario would be: (2*4+2*5+4*6+8*7+16*8+32*9+64*10+128*11) /(256*8)=2562 / 2048=1.2509765625 this results in an expansion. I have tried this for several files and have gained an expansion as expected. to explain the probabilities here are the codes: 4-bit codes: 000|0 000|1 5-bit codes: 001|10 001|11 6-bit codes : 010|100 010|101 010|110 010|111 7-bit codes 011|1000 011|1001 011|1010 011|1011 011|1100 011|1101 011|1110 011|1111 and so on. Then there is an improvement possible, in order to speed up the encoding: it is said that the search of the bytes is too slow. A solution would be to create two arrays. One would hold the probabilities/ frequency of the bytes, the other would be an array which maps the bytes to their indices/positions in the dictionary. To encode, no searching of the entire table is necessary any more, you can just lookup the position and encode directly. ------- 21 Mar 2006. Error found by Artur Ferreira Page 705: Line 13. "The G.711 standard recommends the use of A=255" should be "The G.711 standard recommends the use of A=87.6". ------- 15 Aug 2006. Errors found by Raymond Martin Page 522: par 2. The sentence "(Strictly speaking, the Nyquist rate is the difference between the maximum and minimum frequencies, the bandwidth of the signal.)" should be omitted. Page 692: par 7. "periods" should be changed to "wavelengths". Page 697: par 2. The solution to the sampling problem is to sample sound at a little over the Nyquist frequency} (page 522), which is twice the maximum frequency contained in the sound. Thus, if a sound contains frequencies of up to 2 kHz, it should be sampled at a little more than 4 kHz. (A detail that's often ignored in the literature is that signal reconstruction is perfect only if something called the Nyquist-Shannon sampling theorem [Wikipedia 03] is used.) Such a sampling rate guarantees true reproduction of the sound. This is illustrated in Figure7.3c, which shows 10 equally-spaced samples taken over four periods. Notice that the samples do not have to be taken from the maxima or minima of the wave; they can come from any point. ------------- 15 March 2007. Errors found by Pedro Moreira page 86, 7th,8th paragraph (incomplete, incorrect discussion) page 87 fig. 2.32 "Figure 2.32d shows the tree after the frequency of A has been incremented to 4. Once we decide that A is the current node, its frequency (which is still 3) is compared to that of its successor (4), and the decision is not to swap. A�s frequency is incremented, followed by incrementing the frequencies of its parents." But, when updating the frequencies, the nodes at the second level of tree (just below the root) (see fig. 2.32d) become in descending order (they should have been swapped, accordingly to the algorithm) Figure 2.32 should also be modified in respect (from 2.32d to 2.32g). (note that they are then swapped between figures 2.32f <-> 2.32g) Consequently, the next paragraph incorrectly discusses the swapping of nodes (it would not be needed). ------------- 23 March 2008. Error found by Benjamin Laugraud Page 219, Table 3.30(b). The value "7" is missing. It should be in the same position as in part (a). ------------- 3 Apr 2011. Error found by Frédéric Kayser. Page 233, line -20. "This table has codes 0--256 for..." should be "This table has codes 0--255 for..." -------- "How many letters are actually read into a word by a careless person who knows what to expect, who sets out with the idea that the message is from a certain person, how many words into the sentence? We guess as we read, we create; everything starts from an initial mistake; the mistakes that follow (and not only in the reading of letters and telegrams, not only in reading as a whole), extraordinary as they may appear to a person who has not begun at the same starting-point, are all quite natural." Marcel Proust, (The Sweet Cheat Gone, Chapter 3, Venice)