Non-Recursive String Table for LZW Decompressors - John Jeppson The standard string table format, as suggested in "A Technique for High- Performance Data Compression" by Terry A. Welch, is storage elements of the form [prefix]K where [prefix] is a reference to an earlier string in the table. Character output then consists of looking up a string table entry and generating the output string by recursive decomposition. The characters come off one at a time in reverse order, so are generally pushed on the stack and copied off as a string, thereby correcting the order. The [prefix]K format is particularly well suited to compressors since it facilitates hashing methods for searching the table. Decompressors, on the other hand, never search the table so during decompression this format may not yield optimum performance. This file discusses an alternative strategy for decompressors. The trade-off is memory for speed. Computationally intensive recursive decomposition is not required, but the entire output charstream must be retained in memory as an array of bytes. The string table is implemented as a table of pointers (and counts) to "earlier" strings already present in the output stream. Output for each received code is then a simple matter of copying the string from an earlier site to the present site. If 'N' is the number of bits/pixel then the first 2**N strings in the string table consist of a single character. These single-character strings are the "root" codes and when first encountered do NOT previously exist in the output stream. All subsequent strings are at least two characters in length and do exist in the output charstream. The string table should consist, therefore, of a set of 4096 records each consisting of a pointer and an integer count. (The longest possible string is less than 4096 characters, so a 16-bit count will suffice.) The first 2**N string table entries should be pointers to a separate static string array, and each count should be 1. This static string array can just be a pre-initialized array [00,01,02...FE,FF] of byte. Whenever a clear code is encountered the string table should be cleared and re-initialized to these root-string pointers. During decompression every code value received requires (1) a write of one or more characters to the output stream, and (2) a new entry in the string table (except for the first code). The destination address and count of the current write and of the previous write are retained and updated as decompression proceeds. When a code is received there are two cases: 1. value(code) already in string table. (a) copy value(code) to the output charstream at curAddress. (b) install previousAddress/previousCount+1 as the next entry in the string table. Note that by incrementing the count, the new string entry will "overlap" onto the first character of the curAddress, which is just what we want (i.e. [prefix]K ). 2. value(code) not yet in string table. In this case we copy [prefix]K to the output charstream at curAddress. Here [prefix] is the string at previousAddress/previousCount (from the previous write operation) and K is the first character of that same string (i.e., previousAddress[1]). We then add curAddress/curCount as the next entry in the string table. In every case we are simply copying a string of characters which should be much faster than recursive decomposition. It won't work, however, if the output charstream is not retained in memory. It also won't work, or at least not gracefully, if the output charstream is maintained as small bitfields or is broken up by intervening fill. This might be true, for example, if the output stream is sent directly to screen memory. In those circumstances recursive decomposition is the way to go. With regard to the "root" string pointers, beware of relocation problems. These absolute addresses should be loaded dynamically whenever the string table is reset, they should never be hard coded into the application.