X Image Extension Overview Robert NC Shelley AGE Logic, Inc. 9985 Pacific Heights Blvd. San Diego, CA 92121 Abstract The X Image Extension provides a powerful mechanism for the transfer and display of vir- tually any image on any X-capable hardware. While not intended for use as a general pur- pose image processing engine, XIE does provide a robust set of image rendition and en- hancement primitives that can be combined into arbitrarily complex expressions. XIE also provides import and export facilities for moving images between client and server, or core X and XIE, and for accessing images as resources. XIE Design Goals The X Image Extension (XIE) was designed to facilitate efficient and robust image display on X Window System servers. XIE provides tools for rapidly transferring an image from client to server, and for converting the image format to match the server's hardware characteristics. XIE does not attempt to provide tools for general-purpose image processing. However, simple image enhancement and filtering operations such as contrast enhancement and convolution are available, as well as dithering, geometric transformations, histo- gram generation, etc. The X Window client-server architecture was based on the assumption that data on the wire would be of a relatively high level. Therefore, a high-bandwidth connection is not an absolute requirement for X. However, image transfer requires sending large amounts of low-level data, which may take an unacceptable amount of time on slow or heavily loaded networks. This image transport bottleneck makes the X Window System less than optimal for supporting image intensive applications. XIE solves the image transport problem by supporting the transmission of compressed images across the wire, and providing image decompression facilities in the server. Compression ratios on the order of 20:1 or more are common for bitonal images. The client can send a relatively modest amount of data to the server where it is decompressed into a much larger data structure. The image at full resolution may not fit com- pletely on the server's display monitor, so XIE supports geometric operations, such as scale. The large im- age may be scaled down to a convenient window size and displayed as is depicted in Figure 1. Rendition in XIE is defined to be the process of changing the format of an image in order to make it com- patible with the server's frame buffer, its associated lookup table(s), and other limitations. Once an image has been rendered, it may be transferred directly to the hardware and viewed on the screen. Rendering a compli- cated image to be compatible with highly limited hardware can be quite challenging, and may require a series of manipulations to be performed on the image. It may be necessary to convert the image from trichromatic to monochromatic (gray scale or bitonal). A scaling operation can be used to reduce the image's size to fit the screen dimensions. Convolution provides a means for sharpening the image or performing other general filtering operations. Dithering may be used to reduce the number of levels to match the number of colormap entries available, and so on. Figure 1. Compressed transport from client to server, followed by decompression, scaling and display. The primitives of XIE that are needed to support image transfer and rendition define a new class of virtual display hardware. In some cases, it may be possible for server implementors to map these functions directly onto physical hardware accelerators. For example, the Convolution operator in XIE might be implemented di- rectly using a convolution board or chip. Providing a vehicle for hardware vendors to upgrade the perform- ance of X servers for image-specific operations was an explicit design goal of XIE. To summarize the main goals of XIE: * Display any image on any server, * Reduce the time it takes to move images from client to server, * Support the development and utilization of image-specific hardware, * Do not cover all of image processing. XIE Historical Summary The X Image Extension project was initiated in 1988 by Joe Mauro of Digital Equipment Corporation. The first formal proposal was submitted to the Consortium in 1990. At that time, it was voted not to proceed to technical review, because of a lack of sufficient resources to review the proposal. Digital continued to work with interested Consortium members in revising the protocol, and produced a portable sample imple- mentation of XIE (Version 3) to demonstrate proof-of-concept. In March of 1991, XIE was officially moved into technical review. After more than a year and a half of discussion by many interested companies, a sub- stantially modified XIE (version 4) was promoted from technical review to public review at the end of 1992. In January of 1993 AGE Logic was contracted to produce a new sample implementation which was com- pleted in January of 1994. Several more improvements were made to the protocol during the public review and sample implementation period. The major changes to the XIE protocol between versions 3 and 5 in- cluded: * more direct control over server behavior given to the client, * generalization and refinement of the computational model, * elements no longer do implicit data type conversions, * tools for creating a processing engine moved to the client side, * unification of all geometric operators into a single transformation, * enhanced support for color spaces and color operations, * cleaner interface to Core X, * support for more compression techniques, * identification of a Document Imaging Subset. XIE Architecture The image extension is designed to add new functionality to X while at the same time avoiding making any fundamental changes to the X architecture. XIE receives requests, and sends errors, replies, and events through the standard mechanisms defined for extensions. XIE resources must be registered with the Core X resource manager so that at client shutdown all XIE resources can be cleanly deallocated. All data in and out of XIE must pass through X, either by the standard extension mechanism or by explicit import and ex- port. Note in Figure 2 below that all utilization of resources in XIE must go through a Photoflo Manager. Data is imported into the Photoflo Manager and exported out from it. The Photoflo Manager, which also runs the computational engine of XIE, is discussed in the next section. Figure 2. High-level view of an X server containing XIE support. XIE's Computational Model Typical XIE processing employs a sequence of operations. For example: * transport an image from client to server, * decompress the image, * scale the image, * render and enhance the image, * display (export image data to a Core X drawable). A client could ask the server to perform each of these operations one step at a time - send the complete image to the server, then decompress it, then scale it, etc., and finally display the result. One disadvantage of this approach is that the user may have to wait a long time before seeing the image on the screen. Fur- thermore, the memory requirements of holding the full decompressed image in the server may be severe. The necessity of producing full intermediate images between steps further aggravates this situation. Consider the simple sequence of operations {transfer image, decompress, scale, export to X} that is depicted in Figure 3: Figure 3. Processing each step completely before moving data to the next element in the sequence requires allocating a large amount of buffer space in the server. When the image is processed one step at a time, large buffers must be allocated to hold the decompressed image in the server. This may require more memory than is available in the server, in which case XIE would have to return an allocation error, thereby preventing the user from viewing the image. Suppose instead that the client could specify all operations in the sequence to the server at once. The server could then choose to perform only part of each operation before passing the output data to the next processing step in the sequence. For example, it might decompress half of the incoming image, scale that half, and export the top half of the image to Core X, at which point the user would see the top of the image displayed on the screen (Figure 4): Figure 4. Processing only half of each step before moving data to the next element. After the top half of the image has been displayed, the server could decompress the remainder of the image, scale it, and export the result to Core X, thus completing the display of the whole image (Figure 5). Two benefits have been realized. First, the user was able to see something appear on the screen in roughly half the time. Second, the server didn't have to allocate as much buffer space. In fact, the amount of buffer space required will be inversely proportional to the number of steps into which the process is subdivided (until some minimum level is reached). Figure 5. Processing the second half of each step. In XIE, the ability to define a sequence of operations to be performed is referred to as constructing a Photo- flo. Photoflos are so named because the image data (sometimes photographic) is viewed as flowing through the sequence of elements. The Photoflo Manager depicted in Figure 2 obtains data via import elements, processes the data using process elements, and outputs the final result via export elements. A Photoflo for the sequence just discussed would look like: No explicit decompression step is required. The Import element describes the format of the data to be trans- ferred, and the server knows it must implicitly decompress the image data prior to passing it to the Geometry element. Similarly, the client may specify that an image is to be compressed before it is stored in a Photomap or returned to the client. Photoflos do not have to be linear. The client is allowed to construct any Directed Acyclic Graph (DAG) as long as the way the elements are connected makes sense. For example, the Photoflo in Figure 6 computes an unsharp masked enhancement of the input image by first blurring a sub-region of the image, using a convo- lution operation, and then adding the difference to the original image: Figure 6. Photoflo to compute unsharp masked enhancement with rectangles of interest. The client can specify this DAG to the server all at once, with either CreatePhotoflo (to create a permanent Photoflo that can be re-used) or ExecuteImmediate (to create a temporary Photoflo that will be destroyed immediately upon completion). The Photoflo Manager creates the Photoflo and then waits for further in- struction. Once the Photoflo is activated (performed by ExecutePhotoflo for permanent flos, or performed implicitly for temporary flos), the Photoflo Manager will asynchronously execute the Photoflo, processing input as it becomes available. The client pushes data into the Photoflo by using a PutClientData request, specifying the Photoflo and the appropriate Import element. The client reads data out of the Photoflo by sending a GetClientData request, specifying the Photoflo and the appropriate Export element. It is not un- usual for the Photoflo to block further processing pending either additional input from the client or retrieval of available data by the client. Several operations in XIE permit control over the image processing domain by specifying a control-plane or a set of rectangles-of-interest. This allows portions of the image to be included or excluded from the rendi- tion or enhancement operation. In Figure 6, the Convolve operation is restricted to a subset of all the pixels in the image by importing a set of rectangles-of-interest (ROI) to modify the processing domain of the Con- volve element. Pixels that fall within the area described by the set of rectangles are processed; those which do not are passed through unchanged. This allows the user to reduce the computation time in areas of the image that are not interesting. XIE Resources XIE defines six permanent resources that can be created and destroyed: ColorLists, LUTs, Photoflos, Pho- tomaps, Photospaces, and ROIs. Some of these have already been introduced above. A Photoflo is a computational engine that the client specifies using either a CreatePhotoflo or an Exe- cuteImmediate request. A Photoflo created by CreatePhotoflo is considered permanent, and exists until an explicit DestroyPhotoflo request or client shutdown. The resource ID specified at Photoflo creation time is a normal core X resource ID. A Photoflo created by ExecuteImmediate is a temporary resource that is created in a separate resource ID space, called a Photospace. Before creating the first temporary Photoflo, a Photo- space must be created to hold it using the CreatePhotospace request. If this Photospace is destroyed, all Photoflos in the space are immediately destroyed along with it. LUTs, ROIs and Photomaps are used as Photoflo inputs by using ImportLUT, ImportROI and ImportPho- tomap elements, respectively. A Photomap is a handle for image data stored in the server. Photomaps are created with no attributes: they inherit data and descriptive information from ExportPhotomap elements in Photoflos. The attributes of a Photomap may be queried by using the QueryPhotomap request. A LUT is a handle for lookup table data for the Point operation. A LUT resource receives data and attributes from an ExportLUT element in a Photoflo. A ROI resource is a handle for a set of rectangles-of-interest. It is popu- lated using the ExportROI element. ColorLists are used to store colors allocated from a colormap. When a colormap and a color or gray scale image is passed to the ConvertToIndex element, the colormap identifier and the index of each colormap cell allocated by ConvertToIndex are stored in the ColorList. The contents of a ColorList can be queried using the QueryColorList request. Element Definitions This section briefly discusses the elements with which one can create computational engines (Photoflos). The number of individual Photoflo elements listed below is deceptively modest, in that there are many dif- ferent ways in which several of the elements could have been specified. For example, in defining a geometric transformation for an image, it is quite simple to specify the transform in idealized coordinates, but impossi- ble to choose one optimal re-sampling method. This is because evaluation of the algorithm's performance depends on factors that are largely subjective, and processor or data dependent. It is unreasonable to pick a single algorithm for a given element when clearly dozens of other algorithms are available of equal or better quality. It was decided to build flexibility into XIE's basic element definitions, giving both server implemen- tors and client writers leeway in selecting the algorithms that best meet their needs. The XIE protocol provides this flexibility by incorporating a technique parameter within several of the ele- ments that acts as a modifier of the element's behavior. The techniques that are supplied to import and ex- port elements generally identify image compression schemes, whereas the techniques specified for process- ing elements identify various image processing algorithms that are slightly different methods of doing the same basic operation. In some cases techniques offer a tradeoff between execution time and the fidelity of the results, while in other cases different techniques are desirable because of the image class or content. For some techniques the server implementor is required to provide a default algorithm. This allows an application to select an operation in a generic way without concerning itself with the details of the particular algorithm being used. The protocol document specifies that some techniques must be implemented in all servers, whereas other techniques are considered optional. The server implementor is also permitted to extend XIE with additional proprietary techniques as is seen fit. XIE includes a query request to determine what techniques are supported by a particular implementation and to determine what algorithms are being used as defaults. Import Elements Import elements may be classified according to the source of their data, which must be either: the client, Core X, or XIE resources. Thus: Client Import Elements ImportClientLut - reads in lookup table data from the client. Since the client will send raw data using PutClientData, this element is used to specify the format of the expected data stream. ImportClientPhoto - reads in image data from the client. Since the client will be sending raw data using PutClientData, this element is provided to specify the format of the data stream. A decode technique, and its associated arguments, are used to specify the compression algorithm applied to the data. ImportClientROI - used to read in a list of rectangles for process domain control. Since a fixed format is defined, only the number of rectangles needs to be provided. Core X Resource Import Elements ImportDrawable - reads in image data from a window or pixmap. ImportDrawablePlane - reads in a selected image bitplane from a window or pixmap. XIE Resource Import Elements ImportLUT - import existing lookup table data, typically to be used by the Point processing element. ImportPhotomap - import previously stored image data. No decoding parameters are required (as was the case with ImportClientPhoto) because the server remembers the format of the stored data. ImportROI - import a list of rectangles that was previously stored in the server, typically used for process domain control. Process Elements Processing elements can be roughly grouped by their complexity, functionality, or the aspect of the image that they work with. One such grouping follows: Simple Algebraic Processing Elements Arithmetic - allows arithmetic operations, such as addition or subtraction, to be performed between a pair of images or an image and a constant value. Other operations, such as multiplication or division, can only be performed between an image and a constant value. Blend - allows an alpha-blend operation between two images or an image and a constant value. The weights for the blend operation can be determined by a constant value or on a per-pixel basis through the use of an alpha-plane. Compare - compares two images or an image and a constant value to produce a bit plane with ones where the comparison is satisfied. Logical - performs per-pixel bitwise operations on images, or between an image and a constant. Math - performs mathematical operations, such as square root or natural logarithm, on a single source image. Format Conversion Elements BandCombine - accepts three SingleBand images and merges them to form a single TripleBand image. BandExtract - produces a SingleBand image from a TripleBand image by summing a percentage of the pixel values from each input band with a bias value. BandSelect - selects a single image band from a TripleBand image. Constrain - converts unconstrained pixel values into integer pixel values (certain operations in XIE can be performed using unconstrained pixel values which are represented within the server as either floating-point or fixed-point values). ConvertFromIndex - takes a colormap and an image as input and produces a new image whose pixel values are taken from the colormap. ConvertFromRGB - converts RGB image data into another colorspace. ConvertToIndex - allocates image colors from a colormap or matches colors in the image against those already existing in the colormap. Each pixel in the output image is the colormap index of the colormap cell that most closely represents the corresponding input image pixel. The action taken (allocate, match, etc.) is dependent on the color allocation technique given to ConvertToIndex. Each colormap cell that is allocated by ConvertToIndex is stored in a ColorList resource. ConvertToRGB - converts to RGB from another color space. Dither - reduces the number of quantization levels in an image by spreading the information contained in a pixel over the surrounding area. Unconstrain - converts integer pixel values into unconstrained pixel values (certain operations in XIE can be performed using unconstrained pixel values which are represented within the server as either floating-point or fixed-point values.) Simple Enhancement and Filtering Elements Convolve - performs a convolution on an image. MatchHistogram - computes the histogram of the input image and remaps pixel data so as to match, as closely as possible, the specified histogram shape. Point - applies a lookup table operation to each pixel in the source image. Point is so named because the output pixel value depends only on the input value at that point. Domain-based Operations Geometry - performs a geometric transformation on image data. Special cases include scale, crop, mirror, shear, rotate, and translate. The operation is specified as a mapping of the coordinates of the output image back to the coordinate space of the input image. Re-sampling and antialiasing algorithms can be specified by using technique parameters. PasteUp - allows multiple image tiles to be combined into a single image. A tile is an image with an offset attached, that specifies where the tile is to be placed in the output image. Overlapping tiles are resolved by a defined stacking order. If the tiles don't completely cover the output image, uncovered re- gions are filled by a constant pixel value. Export Elements As with import elements, export elements may be classified according to the destination of their data, which must be either: the client, Core X, or XIE resources. Thus: Client Export Elements ExportClientHistogram - computes the histogram of an image and makes it available to the client via the GetClientData request. ExportClientLUT - makes lookup table data available to the client. The client can specify the entire LUT or a sub-range of the LUT. ExportClientPhoto - makes image data available to the client. The client can specify the format it wants by using an encode technique and its associated parameters. ExportClientROI - makes rectangle-of-interest data available to the client. Core X Resource Export Elements ExportDrawable - writes image data into a pixmap or window. The client must provide a graphics context to direct the insertion of data. ExportDrawablePlane - writes a bitonal image into a pixmap or window. The graphics context may be used to expand the data to foreground and background colors and to perform stippling. XIE Resource Export Elements ExportLUT - exports lookup table data to a LUT resource. ExportLUT obtains its input from a previous element such as ImportClientLUT or ImportLUT. ExportPhotomap - stores an image in a Photomap resource. Encoding parameters can be supplied to tell the server what format to store the data in. Alternatively, the client may allow the server to choose whatever image compression scheme it deems best. ExportROI - stores a list of rectangles in a ROI resource. The exported data must come from either ImportClientROI or ImportROI. Subsetting The full XIE protocol defines a very powerful flexible system. It is recognized, however, that many appli- cation writers will wish to use XIE features to perform relatively simple tasks on simple (e.g. bitonal) images. They may never deal with color spaces or processing domains or want to do anything more complex than simply import, scale, and export. They will be interested in using a server that is tuned specifically for the needs of document imaging, ignoring all other concerns. In fact, they can do without many of XIE's capabilities, as long as the server provides the simple operations they require and it obeys the same protocol as the full XIE specification for those operations. The XIE Document Imaging Subset (DIS) was designed to respond to this demand. DIS drops the concepts of Color Lists and processing domains completely, and supports only two processing elements: Geometry and Point. Geometry is required in order to do scaling and rotation. Since the DIS client is responsible for managing the colormap, Point is needed to convert pixel values into colormap indexes. The only required image compression schemes are those associated with bitonal images. DIS also provides the ability to import a full-depth drawable, scale it using Geometry, and then export the result back to Core X. This functionality may be useful for those who want to be able to scale images produced by other applications. One could imagine using this feature to resize a drawing in order to include it in a document. Summary The X Window System defines a standard architecture that has provided a stable platform-independent display environment for application developers to build upon. By adhering to the same basic principles, XIE enhances the image display capabilities of the core protocol by providing: * efficient image transport between client and server, * efficient caching of images on the server, * image rendition and enhancement capabilities on the server, * support for the development and utilization of image-specific hardware, * consistent policy-free support for image display across all platforms. References 1. X Window System Scheifler & Gettys Digital Press 2. X Image Extension Protocol Reference Manual Shelley et al. AGE Logic, Inc. 3. XIElib Specification Rogers AGE Logic, Inc. 4. XIE Sample Implementation Architecture Shelley, Verheiden & Fahy AGE Logic, Inc. 5. X Image Extension Overview Fahy & Shelley The X Resource - Special Issue C, 1/93 O'Reilly & Associates, Inc. Copyright (c) 1994 AGE Logic, Inc. Permission to use, copy, modify, distribute, and sell this article for any purpose is hereby granted without fee, provided that the above copyright notice and this permission notice appear in all copies. AGE Logic, Inc. makes no representations about the suitability for any purpose of the information in this paper. This material summarizes a standard of the X Consortium and is provided as is without express or implied warranty. 1