Non-Recursive String Table for LZW Decompressors - John Jeppson

The standard string table format, as suggested in "A Technique for High- 
Performance Data Compression" by Terry A. Welch, is storage elements of the 
form [prefix]K where [prefix] is a reference to an earlier string in the 
table.  Character output then consists of looking up a string table entry and 
generating the output string by recursive decomposition.  The characters come 
off one at a time in reverse order, so are generally pushed on the stack and 
copied off as a string, thereby correcting the order.  

The [prefix]K format is particularly well suited to compressors since it
facilitates hashing methods for searching the table. Decompressors, on the
other hand, never search the table so during decompression this format
may not yield optimum performance.  This file discusses an alternative
strategy for decompressors.

The trade-off is memory for speed.  Computationally intensive recursive 
decomposition is not required, but the entire output charstream must be 
retained in memory as an array of bytes.  The string table is implemented as 
a table of pointers (and counts) to "earlier" strings already present in the 
output stream.  Output for each received code is then a simple matter of 
copying the string from an earlier site to the present site.  

If 'N' is the number of bits/pixel then the first 2**N strings in the string 
table consist of a single character.  These single-character strings are
the "root" codes and when first encountered do NOT previously exist in the
output stream. All subsequent strings are at least two characters in length
and do exist in the output charstream.

The string table should consist, therefore, of a set of 4096 records each 
consisting of a pointer and an integer count.  (The longest possible string 
is less than 4096 characters, so a 16-bit count will suffice.) The first 2**N 
string table entries should be pointers to a separate static string array, 
and each count should be 1.  This static string array can just be a 
pre-initialized array [00,01,02...FE,FF] of byte.  Whenever a clear code is 
encountered the string table should be cleared and re-initialized to these 
root-string pointers.  

During decompression every code value received requires (1) a write of one or 
more characters to the output stream, and (2) a new entry in the string table 
(except for the first code).  The destination address and count of the 
current write and of the previous write are retained and updated as 
decompression proceeds.  

When a code is received there are two cases:

1. value(code) already in string table.
    (a) copy value(code) to the output charstream at curAddress.
    (b) install previousAddress/previousCount+1 as the next entry in the 
string table.  Note that by incrementing the count, the new string entry will 
"overlap" onto the first character of the curAddress, which is just what we 
want (i.e. [prefix]K ).

2.  value(code) not yet in string table.
    In this case we copy [prefix]K to the output charstream at curAddress.  
Here [prefix] is the string at previousAddress/previousCount (from the 
previous write operation) and K is the first character of that same string 
(i.e., previousAddress[1]).  We then add curAddress/curCount as the next 
entry in the string table.  

In every case we are simply copying a string of characters which should be 
much faster than recursive decomposition.  It won't work, however, if the 
output charstream is not retained in memory.  It also won't work, or at least 
not gracefully, if the output charstream is maintained as small bitfields or 
is broken up by intervening fill.  This might be true, for example, if the 
output stream is sent directly to screen memory.  In those circumstances 
recursive decomposition is the way to go.

With regard to the "root" string pointers, beware of relocation problems.
These absolute addresses should be loaded dynamically whenever the string
table is reset, they should never be hard coded into the application.