Errata list for Handbook of Data Compression, 5th Edition

Last Updated 19 May 2020 Errors found by the authors Page 1285, line -9. "Glenn" should be "Glen". Page 1359, Lines 1-2. These lines should be merged. ------- Error found by Mahmoud El-Sakka Pages 639-640. The CALIC encoder algorithm listed on these pages is wrong. This error has been in the book since the 2nd edition. Anyone interested in CALIC is invited to submit a listing of the correct algorithm to the authors. -------- (May 2010): Error found by Bernd Herd. Page 335, end of paragraph 3. The book claims that I proposed the LZH idea. While I would be proud if I had, I have only worked a lot with the lha packer developed by Haruyasu Yoshizaki around 1990, especially I used it for quite a while to develop setup programs for windows applications and a phonebook database CD. The file format used by LHA is called LZH. It uses the combination of LZ and Huffmann coding. I have never cared a lot about the details. Take a look at "http://packages.debian.org/lenny/lha" for more information: -------- (May 2010): Error found by Ugo Vaccaro. Page 1275, bibliography item "Capocelli and De Santis (1992)" the list of authors is incomplete. The correct item is: R. M. Capocelli, A. De Santis, L. Gargano and U. Vaccaro (1992) "On the Construction of Statistically Synchronizable Codes", IEEE Transactions on Information Theory, IT-38(2):407-414, March. -------- (Jan 2011): Error found by Ömer ÇAKIR. Page 368, Table 6.24. "280 s" should be "280 s " (the "s" is followed by a blank space). ------------- (Apr 2011): Error found by Frédéric Kayser. Page 408, line 10. "This table has codes 0--256 for..." should be "This table has codes 0--255 for..." (Aug 2011): Page 100, above table 3.9: "Thus, the parameter array (0,3,1,2)" should be "Thus, the parameter array (2,3,1,2)" Page 107, last row of table 3.14 should be $10^6$. Page 152, table 3.42: Row "7 18 7+13 00101" should be "7 20 7+13 00101" Page 74, Blaise Pascal quote: "Pensèes" should be "Pensées" and "ètonnè" should be "étonné". (Jan 2012): Page 553, Fig. 7.84(h) 21 should read 18 since in table (i) the 18 (in row 2 column 4) is smaller than 21. Also, page 554 (3), 58, 33, 18 should read (2), 58, 33, 21 for the same reason. (Feb 2012): Page 108, Table 3.15. Either I did not grasp RBUC as it is introduced on page 107 or the example is plain wrong. I could not figure out how you could get ci=(3,0,2,2) in Table 3.15 - based on what you had on the line bi it should have been (3,0,3,3) and why did you represent the two last 3's using three bits (011) instead of just two (11) - based on its binary representation it should have been (3,0,1,3) This led me to think there was something wrong with this example, I did it on my own from scratch, here are my findings: The set of selectors bi=(4,3,0,0,3,3,4) should be (4,2,0,0,3,3,4) since both 2 and 3 in the set of integers 15,6,2,3... only need two bits (and not three) to be encoded, this is already wrong in the text that presents the example. Table 3.15 becomes (I've added dots to show the zeros for clarity, underscores are differences between me and the book): 010   ei=(2) 10|10   di=(2,2) 11|00|10|11   ci=(3,0,2,3) 100|010|.|.|11|11|100   bi=(4,2,0,0,3,3,4) 1111|0110|10|11|.|.|.|.|100|101|001|111|1000   15,6,2,3,0,0,0,0,4,5,1,7,8 (March 2013): Page 107, table 3.14. The lengths of the Elias Omega Codes in table 3.14 are probably wrong, based on table 3.13 (page 106) and this code It should read 1 3 3 6 6 7 11 12 13 13 17 21 28 (April 2013): page 403 this time: "A length of 12 becomes edoc 265 followed by a single bit 1. This is written as the 7-bit prefix code 0001001 followed by 1." (edoc 264 was 0001000, edoc 265 is 0001001 not 0001010 which is edoc 266) "A length of 256 becomes edoc 284 followed by five bits 11101. This is written as the 7-bit prefix code 11000100|11101." or "A length of 257 becomes edoc 284 followed by five bits 11110. This is written as the 7-bit prefix code 11000100|11110." (notice that edoc 284 actually covers the 227-258 range, in practice a length of 258 is written using -the usually shorter- edoc 285, but still edoc 284 + 11111 = 258, thus 11110 = 257, 11101 = 256,... For the first part edoc 280 = 11000000, 281 = 11000001, 282 = 11000010, 283 = 11000011, 284 = 11000100,... 11000101 is in fact edoc 285) "The table shows that a distance of 6 is represented by 00100|1, a distance of 21 becomes the code 01000|100, and..." or "The table shows that a distance of 6 is represented by 00100|1, a distance of 22 becomes the code 01000|101, and..." (17 = 000, 18 = 001, 19 = 010, 20 = 011, 21 = 100,... 101 is 22) (Nov 2013): Pages 407--408: The description of the Deflate algorithm is wrong. The small PDF located here (82 KB) has the correct description. Currently, URL http://frdx.free.fr/new407rep.html also has the same text, but URLs tend to be short lived. ------------- (Jun 2011): Error found by Randy Silva. Page 964, line 7. "800 kHz" should be "8 kHz" -------- (July 2012) Errors found by Nigel Horspool. Page 901, Table 9.35: "complement_horizontal_forward_r" is misspelled four times. Page 1282, The two references to Horspool should be "Horspool, R. N." (the initials are reversed). Page 1343: Horspool, Robert Nigel Stanhope (1948--) ---------- (Jan 2013) Errors in the Bibliography found by Nelson H. F. Beebe. Buttigieg, Victor and P. G. Farrell (1995b) needs this change: Essex, England -> Southend-on-sea, Essex, England Libe Abaci -> Liber Abaci John Cleary: inconsistent initials [John G. vs John J.] [Schwarz, Heiko, Detlev Marpe, and Thomas Wiegand (2007)] IEEE Transaction on Circuits and Systems for Video Technology -> IEEE Transactions on Circuits and Systems for Video Technology [Acharya, Tinku and Joseph F. J\'aJ\'a (1996)] Information and Computer Science -> Information Sciences [Fang I. (1966)] There is recent new data on the ETAOIN SHRDLU letter frequency measurement. Reznik, Yuriy (2004) (ICASSP apos;04) -> (ICASSP '04) [Peano, G. (1890)] Sur Une Courbe Qui Remplit Toute Une Aire Plaine -> Sur Une Courbe Qui Remplit Toute Une Aire Plane Matlab (1999) -> Matlab (1992) [the document that it refers to is dated 1992] [Gage, Philip (1994)] {\it C/C++ Users Journal}, {\bf12}(2)23--28, Feb 01 [Yokoo:1997:DCU] Data Compression Using Sort-Based Context Similarity Measure -> Data Compression Using a Sort-Based Context Similarity Measure [Ahmed, N., T. Natarajan, and R. K. Rao (1974)] "R. K. Rao" -> "K. R. Rao" [Bentley, J. L. et al. (1986)] A Locally Adaptive Data Compression Algorithm -> A Locally Adaptive Data Compression Scheme [Manber, U., and E. W. Myers (1993)] Manber, U., and E. W. Myers -> Manber, Udi and Gene Myers [McCreight, E. M (1976)] {\bf32} -> {\bf23} [Lelewer, D. A., and D. S. Hirschberg (1987)] 261--297 -> 261--296 [Chaitin, Gregory J. (1966)] On the Lengths of Programs for Computing Finite Binary Sequences -> On the Length of Programs for Computing Finite Binary Sequences Ekstrand, Nicklas (1996) ``Lossless Compression of Gray Images via Context Tree Weighting,: should be "Lossless Compression of Grayscale Images via Context Tree Weighting". Alistair:2001. "URL no longer accessible on 27 January 2013.", Peitgen:1982. 1982 should be 1986. Horspool, N. R., and G. V. Cormack (1992) the first author should be R. N. Horspool -------- (Feb 2015) Errors found by Douglas Hundley. LZW Example on top of page 372. In the step-by-step description of the encoder, you store the hash-index in the array location pointed to by the hash index. I believe you should be storing the node number, which is simply an integer that is incremented for each new node. The steps would look something like this: 1. Hash(l,97) -> 278. Array location 278 is set to (97,256,l) // next available node number is 256 2. Hash(f,108) -> 266. Array location 266 is set to (108,257,f) 3. Hash(⎵,102) -> 269. Array location 269 is set to (102,258,⎵) and so on. The encoder then emits node numbers--just as it does in the simple examples earlier in the chapter. (Mar 2015) Page 387, example 6.19.1: in step (a) it should read "Hash xy", not "Hash ya". -------- (June 2015) Error found by Masanori Ogino. Page 3, line 2. "a memoryless source is also termed 'independent and identically distributed', or IIID." IIID should be changed to IID. This should also be corrected in the index, page 1343. -------- (Sept 2015) Error found by Philipp Maximilian Knoll. Page 342, line -3. "s = a_{j+1}a_{j+2}\ldots a_{j+p-1}" should be changed to "s = a_{j+1}a_{j+2}\ldots a_{j+p-1}a_{j+p}". ------------------------ (Sept 2016) Error found by Alan Clark. Page 398, Section 6.23, The V.42bis Protocol I would appreciate the correction of the material related to V.42bis. In particular:- (i) V.42bis was not a successor to V.32bis, which was a modem modulation scheme. It was developed by CCITT SG17 as an extension to the V.42 modem error control protocol (ii) V.42bis was the BTLZ algorithm, proposed by British Telecom. (iii) BTLZ/V.42bis was not LZW based, nor did it use an LRU method for recovering dictionary space. The particular advantage of the BTLZ algorithm is that it did not require the additional pointers or sorting that an LRU or LFU method would require, and in the memory and CPU constrained modems of the late 1980's this was a key factor. There were some later assertions that the BTLZ method of dictionary space recovery would lead to worse performance than LRU/LFU as these were regarded as optimal however comparative testing showed that BTLZ achieved the same performance as LRU/LFU and in some cases outperformed them. If you would like any reference material on V.42bis I would be happy to provide this. I was the developer of the BTLZ algorithm and the editor for the V.42bis Recommendation in CCITT (now ITU), and still have copies of all the original CCITT contributions. -------- (July 2017) Errors found by Glenn Davis page 214 line -4 and page 232 line -12: [Gallager 74] should be [Gallager 78] ------------------------- (Apr 2019): Error found by Ömer ÇAKIR. Page 1228, Table Ans.23. "32 (w)" should be "32 ( )" (a blank space in parentheses). ------------------------- (May 2020): Error found by George Liontos <[email protected]>. Page 84, Section 2.9, Phased-In codes.


“Otherwise, the decoder inputs the next bit b and appends it to 2(i-P) + P. Using n = 9 as an example, if the decoder inputs the three bits 111 (equals P) into i, it inputs the next bit into

b and appends it to 2(7-7) + 7 = 111_2, resulting in either 111_0 or 111_1, depending on b.”

Should be:

“Otherwise, the decoder inputs the next bit b and adds it to 2(i-P) + P. Using n = 9 as an example, if the decoder inputs the three bits 111 (equals P) into i, it inputs the next bit into

b and adds it to 2(7-7) + 7 = 111_2, resulting in either 111_2 = 7 or 1000_2 = 8, depending on b.”

------------------ ------------------- ---------------- The most valuable thing you can make is a mistake - you can't learn anything from being perfect. -Adam Osborne. American entrepreneur, 1939-2003