Introduction

POST++ provides simple and effective storage for application objects. POST++ is based on memory mapping mechanism and shadow pages transactions. POST++ eliminates any overhead on persistent objects access. Moreover POST++ supports work with several storages, storing objects with virtual functions, atomic data file update, provides high performance memory allocator and optional garbage collector for implicit deallocation of memory. POST++ correctly works with C++ classes using multiple inheritance and pointers inside objects.

Describing object class

POST++ storage manager needs information about persistent object classes to support garbage collection, relocation of references while loading, and initialization of pointers to virtual tables. Unfortunately C++ language provides no facilities to extract information about class format at runtime. As far as I want to avoid use of some special tools (preprocessors) or some "dirty trick" solutions (extracting information about classes from debugging information), this information should be provided to storage manager by programmer. Such class registration can be done very easy using special macros provided by POST++.

POST++ uses default constructors for initializing object while loading from storage. Programmer should include macro CLASSINFO(NAME, FIELD_LIST) in definition of any class, which instances can be saved in the storage. NAME corresponds to the name of this class. FIELD_LIST describes reference fields of this class. There are three macros defined in file classinfo.h for describing references:

REF(x)
Describes single reference field.
REFS(x)
Describes one-dimensional fixed array of references. (i.e. array with constant boundaries).
VREFS(x)
Describes varying one-dimensional array of references. Varying array can be only the last component in class. When you are writing class declaration, you specify array with only one element. The actual number of elements in concrete object instance is specified at object creation time.

List of these macros should be separates by spaces: REF(a) REF(b) REFS(c). Macro CLASSINFO defines default constructor (constructor without parameters) and declares class descriptor of this class. Class descriptor is static component of the class with name self_class. So class descriptor of the class foo can be accessed by foo::self_class. As far as constructors without arguments are called for base classes and components automatically by compiler, you should not worry about calling them explicitly. But do not forget to include CLASSINFO macro in definition of any structure, which can be used as component of serialized class. Then you should register your class to be accessible by storage manager. It can be done by macro REGISTER(NAME). Class names are placed in the storage together with objects. Mapping between application and storage classes is established during storage opening. Names of all classes stored in the storage are compared with names of application classes. If some class name is not found within application classes or correspondent application and storage classes have different size, then program assertion will fail.

These rules are illustrated by the following example:

struct branch { 
    object* obj;
    int key;

    CLASSINFO(branch, REF(obj));
};

class foo : public object { 
  protected:
    foo*    next;
    foo*    prev;
    object* arr[10];
    branch  branches[8];
    int     x;
    int     y;
    object* childs[1];
  public:
    CLASSINFO(foo, REF(next) REF(prev) REFS(arr) VREFS(linked));
    foo(int x, int y);
};


REGISTER(1, foo);

main() { 
    storage my_storage("foo.odb");
    if (my_storage.open()) { 
        my_root_class* root = (my_root_class*)my_storage.get_root_object();
	if (root == NULL) { 
	    root = new_in(my_storage, my_root)("some parameters for root");
	}
	...
        int n_childs = ...;
	size_t varying_size = (n_childs-1)*sizeof(object*);
	// We should subtract 1 from n_childs, because one element is already
	// present in fixed part of class.
        foo* fp = new (foo:self_class, my_storage, varying_size) foo(x, y);
	...
	my_storage.close();
    }	
}

Allocating and deallocating objects in storage

POST++ provides special memory allocator for managing storage memory. This allocator uses two different approaches: for allocating small and large objects. All storage memory is divided into pages (which size is independent from operating system page size and in current implementation of POST++ is 512 bytes). Small objects are those objects, which size is less or equal to 256 bytes (page size/2). These objects are allocated using fixed block chains. Each chain contains the list of blocks with the same size. Sizes of allocated objects are aligned at 8-byte boundary. The optimal number of fixed block chains for objects with size not greater than 256 is 14 (number of different equipartitions of the page). Before each object POST++ allocates object header, which contains class identifier of the object and object size. As far as size of header is exactly 8 bytes and in C++ size of object is always greater than 0, block chain with size 8 can be eliminated. Allocation and deallocation of small object usually is very fast: it requires only one remove/insert operation from L1 list. If the chain is empty and we are attempting to allocate new object, then new page is allocated and used for storing objects of this size (page is divided into the blocks, which are appended to the chain). Space for large object (with size greater than 256 bytes) is allocated from free page list. Size of large objects is aligned on page boundary. POST++ uses first feed, random position algorithm for maintaining list of free pages (all free segments of pages are sorted by their address and special pointer is used to follow current position in this list). Implementation of memory manager can be found in file storage.cxx

It is up to the programmer whether to use explicit or implicit memory deallocation. Explicit memory deallocation is faster (especially for small objects) but implicit deallocation (garbage collection) is more reliable. In POST++ mark and sweep garbage collection scheme is used. There is special object in the storage: root object. Garbage collector first marks all objects accessible from the root object (i.e. it is possible to reach the object starting from the root object, and navigating through references). Then all objects that are not marked during first stage of GC will be deallocated. Garbage collection can be made during loading objects from the file (if you pass do_garbage_collection attribute to storage::open() method). It is also possible to explicitly invoke garbage collection during program execution by calling storage::do_mark_and_sweep() method. But be sure that there are no program variable pointed to objects inaccessible from the root objects (these objects will be deallocated by GC).

Because of multiple inheritance C++ classes can have non zero offset within object and references inside object are possible. That is why we have to use special technic to access object header. POST++ maintains page allocation bitmap each bit of which corresponds to the page in the storage. If some large object is allocated at several pages, then bits corresponding to all pages occupied by this object except first one will be set to 1. All other pages have correspondent bits in bitmap cleared. To find start address of the object, we first align pointer value on the page size. Then POST++ finds page in bitmap that contains beginning of the object (this page should have zero bit in bitmap). Then we extract information about the object size from object header placed at the beginning of this page. If size is greater than half of page size then we have already found object descriptor: it is at the beginning of the page. Otherwise we calculate fixed block size used for this page and round down offset of pointer within this page to block size. This scheme of header location is used by garbage collector, operator delete defined in object class and by methods extracting information from the object header about object size and class.

In POST++ special overloaded new method is provided for allocation of objects in the storage. This method takes as extra parameters class descriptor of created object, storage in which object should be created and, optionally, size of varying part of the object instance. Macro new_in(STORAGE, CLASS) provides "syntax sugar" for persistent object creation. Persistent object can be delete by redefined operator delete.

Persistent object protocol

All classes of persistent objects in POST++ should be derives from object class defined in object.h. This class contains no variables and provides methods for object allocation/deallocation and obtaining information about object class and size at runtime. It is possible to use object class as one of multiple bases of inheritance (order of bases is not significant). Each persistent class should have constructor which is used by POST++ system (see section Describing object class). That means that you should not use constructor without parameters for normal object initialization. If your class constructor even has no meaningful parameters, you should add dummy one to distinguish your constructor with constructor created by macro CLASSINFO.

To access objects in persistent storage programmer needs some kind of root object from which each other object in storage can be accessed by normal C pointers. POST++ storage provides two methods allowing you to specify and obtain reference to the root object:

        void    set_root_object(object* obj);
        object* get_root_object();
When you create new storage get_root_object() returns NULL. You should create root object and store reference to it by set_root_object() method. Next time you are opening storage, root object can be retrieved by get_root_object().

Hint: In practice application classes used to be changed during program development and support. Unfortunately POST++ due to its simplicity provides no facilities for automatic object conversion (see for example lazy object update scheme in GOODS), So to avoid problems with adding new fields to the objects, I can recommend you to reserve some free space in objects for future use. This is especially significant for root object, because it is first candidate for adding new components. You should also avoid reverse references to the root object. If no other object has reference to the root objects, then root object can be simply changed (by means of set_root_object method) to instance of new class. POST++ storage provides methods for setting and retrieving storage version identifier. This identifier can be used by application for updating objects in the storage depending on the storage and the application versions.

Storage constructor

You can use several storages in your application simultaneously. Storage constructor takes one mandatory argument - storage name. This name will be used for the storage data file. Also this name is used for constructing names of temporary file for the storage: name preceding with symbol '~' is used in non-transaction mode for temporary copy of the file, name preceding with symbol '.' is use for transaction log, and name preceding with symbol '#' is used only in Windows-95 in non-transaction mode for backup version of the original file.

Two other parameters of storage constructor have default values. First of them max_file_size specifies limitation of storage file extension. If storage file is larger than storage::max_file_size then it will not be truncated but further extends are not possible. If max_file_size is greater than the file size, then behavior depends on storage opening mode. In transaction mode, file is mapped on memory with read-write protection. Windows-NT/95 extends in this case size of the file till max_file_size. The file size will be truncated by storage::close() method to the boundary of last object allocated in the storage. In Windows it is necessary to have at least storage::max_file_size free bytes on disk to successfully open storage in read-write mode even if you are not going to add new objects in the storage. In copy_on_write_map mode, file is mapped on memory with copy on write protection and extra segment of memory is allocated beyond end of the file. When storage::flush() operation is issued, file mapping segment and used portion of the extension segment are copied to the temporary file, which is then renamed to the original one.

The last parameter of storage constructor is max_locked_objects, This parameter is used only in transaction mode to provide buffering of shadow pages writes to the transaction log file. To provide data consistency POST++ should guaranty that shadow page will be saved in the transaction log file before modified page will be flushed on disk. POST++ use one of two approaches: synchronous log writes (max_locked_objects == 0) and buffered writes with locking of pages in memory. By locking page in the memory, we can guaranty that it will not be swapped out on disk before transaction log buffers. Shadow pages are written to the transaction log file in asynchronous mode (with operating system cashing enabled). When number of locked pages exceeds max_locked_pages, log file buffers are flushed on disk and all locked pages are unlocked. Such approach can significantly increase transaction performance (up to 5 times under NT). But unfortunately different operating systems use different approaches to locking pages in memory.

Opening storage

POST++ uses memory mapping mechanism for accessing data from the file. Two different approaches are used in POST++ to provide storage data consistency. First and more advanced is based on transaction mechanism using shadow pages to provide storage recovery after fault and transaction rollback. Before write shadow page creation algorithm is used. This algorithm is implemented in the following way: all mapped on file pages are set to readonly protection. Any write access to such page will cause access violation exception. This exception is handled by special handler, which change page protection to read-write and place copy of this page in transaction log file (log file name is combined from the original data file name preceding by symbol '.'). All following write accesses to this page will not cause page faults. Storage method commit() flushes all modified pages on disk and truncates the log file. storage::commit() method is implicitly called by storage::close(). If fault happened before storage::commit() operation, all changes will be undone by coping modified pages from transaction log to the storage data file. Also all changes can be undone explicitly by storage::rollback() method. To choose transaction based model of data file access, specify storage::use_transaction_log attribute for storage::open() method.

Windows 95 specific: In Windows 95 changing protection of pages of mapped file is not possible. That is why in this system file is loaded in memory and commit operation saves modified pages in file.

Another approach to providing data consistency is based on copy on write mechanism. In this case original file is not affected. Any attempt to modify page that is mapped on the file, cause creation copy of the page, which is allocated from system swap and has read-write access. File is updated only by explicit call of storage::flush() method. This method writes data to temporary file (with symbol ~ before the file name) and then renames this file to original one. So this operation cause an atomic update of the file (certainly if operating system can guaranty atomicity of rename() operation).

Attention: If you are not using transactions, storage::close() method doesn't flush data in the file. So if you don't call storage::flush() method before storage::close() all modifications done since last flush will be lost.

Windows 95 specific: In Windows 95 rename to existing file is not possible, so original file is first renamed to the name with preceding symbol #, then temporary file started with ~ is renamed to the original name and finally old copy is removed. So if fault is happened during flush() operation and after it you find no storage file, please do not panic, just look for file started with # and rename it to the original one.

Hint: I recommend you to use transactions if you are planning to save data during program execution. It is also possible with copy on write approach but it is much more expensive. Also transactions are always preferable if size of storage is large, because creating temporary copy of file will require a lot of disk space and time.

There are several attributes, which can be passed to storage open() method:

support_virtual_functions
This attribute should be set if objects with virtual functions are placed in the storage. If this attribute is not set, POST++ decides that all persistent objects contain references only within storage (to other objects in the storage). So adjustment of references should be done only if base address of data file mapping is changed (this address is stored in the first word of data file and POST++ always tries to map file to the same address to avoid unnecessary reference adjustment). But if object class contains virtual functions, pointer to virtual table is placed inside object. If you recompile your application, address of this table can be changed. That is why, each application using POST++ should link comptime.cxx file, which provides timestamp about time of executable file creation. While storage opening POST++ compares this timestamp with timestamp stored in the data file and if they are different and support_virtual_functions attribute is specified then correction of all objects (by calling default constructor) will be done.
read_only
By setting this attribute programmer says that he wants only readonly access to the data file. POST++ will create readonly view of the data file and any attempt to change some object in the storage or allocate new one will cause protection violation fault. There is one exception: if it is impossible to map data file to the same address or application is changed and support_virtual_functions is specified, then protection of region is temporary changed to copy on write and conversion of loaded objects takes place.
use_transaction_log
Setting of this attribute force using transactions for all data file updates. Shadow page strategy is used for implementing transactions. Transaction is opened implicitly when storage first modification of storage is done. It is closed explicitly either by storage::commit() or by storage::rollback() operations. Method storage::commit() saves all modified pages on disk and truncates transaction log, method storage::rollback() undo all changes made within this transaction.
use_copy_on_write_mapping
When this attribute is set POST++ will map data file in memory using copy on write mechanism. Usually two memory mapped segment are created: one mapped on file (with size equal to size of the file), and one - for future extension (this segment is allocated from swap space and its size depends on maximal size of file specified by application). For first segment copy an write protection is used and for second segment - read-write protection. If application will immediately access most of data from the file, it can be more efficient to create one region in virtual memory and just read file at the beginning of this region. In this case we can avoid numerous page faults which will take place if we map the file. That is why by default map_data_file flag is not set (for read and write access). If you are first of all interested in reducing application startup time then mapping of file is always preferable, because storage open time will be significantly shorter in this case.
do_garbage_collection
When this attribute is set POST++ will perform garbage collection in storage during opening. The operation of collecting garbage is combined with reference adjustment. Using garbage collection is always more safer than manual memory deallocation (due to the problem of hanging references), but explicit memory deallocation has less overhead. Garbage collection in POST++ has one more advantage in comparison with explicit deallocation: garbage collector performs utilization of pages used for small objects. If there are no more allocated small objects at the page then garbage collector will include this page in the list of free pages. This is not done for explicit deallocation because free cells for small objects are linked in chain and it is not so easier to remove them from this chain (in case of garbage collector all chains are reconstructed). Even if you are using explicit memory deallocation, I suggest you to do time by time garbage collection to check for reference consistency and absence of memory leaks (garbage_collection method returns number of deallocated objects and if you are sure that you have explicitly deallocate all unreachable objects, then this number should be zero). As far as garbage collector modifies all objects in the storage (set mark bit), relink free objects in chains), running GC in transaction mode can be time and disk space consuming operation (all pages from the file will be copied to the transaction log file).

You can specify maximal size for storage files by file::max_file_size variable. If size of data file is less than file::max_file_size and mode is not read_only, then extra size_of_file - file::max_file_size bytes of virtual space will be reserved after the file mapping region. When storage size is extended (because of new objects allocation), this pages will be committed (in Windows NT) and used. If size of file is greater than file::max_file_size or read_only mode is used, then size of mapped region is exactly the same as the file size. Storage extension is not possible in the last case. In Windows I use GlobalMemoryStatus() function to obtain information about actually available virtual memory in the system and reduce file::max_file_size to this value. Unfortunately I found no portable call in Unix which can be used for the same purpose (getrlimit doesn't return actual information about available virtual memory for users process).

Interface to object storage is specified in file storage.h and implementation can be found in storage.cxx. Operating system dependent part of mapped on memory file is encapsulated within file class, which definition is in file.h and implementation in file.cxx.

Installation of POST++

Installation of POST++ is very simple. It is now checked for the following platforms: Digital Unix, Linux, Solaris, Windows NT 4.0, Windows 95. I expect no problems with most of all other new Unix dialect (AIX, HP-UX 10, SCO...). Unfortunately I have no access to this systems.

The only thing that you are needed to use POST++ is library (libstorage.a at Unix and storage.lib at Windows). This library can be produced by just issuing make command (there is special MAKE.BAT for Microsoft Visual C++ which invokes NMAKE with makefile.mvc as input). You can place this library to default library catalog in Unix by changing INSTALL_DIR parameter in makefile and doing make install. By default INSTALL_DIR points to /usr/lib.

How to use POST++

There are some examples of classes and application for POST++. First of all this is game "Guess an animal". Algorithm of this game is very simple and result looks rather impressive (something like artificial intelligence). Moreover this gave very good illustrate benefits of persistent object store. Sources of this game are in file guess.cxx. Building of this game is included in default make target. To run it just execute guess. There are example of two useful persistent classes: hash table and AWL tree. Definition of these classes (files awltree.h, awltree.cxx, hashtab.h and hashab.cxx) is a good examples of creating libraries for POST++. You can see that there almost no POST specific code in the implementation of these classes. To test these classes I create special test program testtree.cxx, which helps me to find a lot of bugs in POST++. This program is also included in default make target.

When you will link your application with POST++ library, please do not forget to recompile comptime.cxx file and include it in the linker's list. This file is necessary for POST++ to provide executable file timestamp, which is placed in the storage and used to determine when application is changed and reinitialization of virtual function table pointers in objects is required. Attention! This file should be recompiled each time your are relinking your application. I suggest you to make compiler to call linker for you and include comptime.cxx source file in the list of object files for the target of building executable image (see makefile).

Specific of debugging POST++ applications

Information in this section is meaningful only for application using transactions. POST++ uses page protection mechanism to provide creation of shadow page on the original page modification, After storage opening or transaction commit all mapped on file pages are read-only protected. So any attempt to modify contents of the object allocated at this page will cause access violation exception. This exception is handled by special POST++ handler. But if you are using debugger, it will catch this exception first and stop application. If you want to debug your application you should do some preparations:

Some more information about POST++

POST++ is shareware. It is distributed in hope to be useful. You can do with it anything you want (with no limitation on distributing products using POST++). I will be glad to help you in using POST++ and receive all kind of information (bug reports, suggestions...) about POST++. Shareware status of POST++ doesn't mean lack of support. I promice you to do my best to fix all reported bugs. Also e-mail support is guaranteed. POST++ can be used for various purposes: storing information between session, storing object system in file, snapshots, informational systems... But if you fill that you need more serious object oriented database for your application supporting concurrency, distribution and transactions, please visit GOODS (Generic Object Oriented Database System) home page.

Look for new version at my homepage | E-mail me about bugs and problems