(* TBTree16             Copyright (c)  1988,1989       Dean H. Farwell II    *)

unit Btree;

(*****************************************************************************)
(*                                                                           *)
(*                      B T R E E   R O U T I N E S                          *)
(*                                                                           *)
(*****************************************************************************)

(* These routines are responsible for handling the indexes.  The scheme used
   is Knuth's variation of B-Trees discussed on page 64 of "Database Systems
   Volume I Fourth Edition" by C.J. Date.  There are two different types of
   nodes in the tree.  The upper level(s) are index nodes and fill the role of
   the normal B-Tree.  The lowest level nodes are sequence nodes and represent
   a dense index to the file.  The entries in the sequence node are sorted and
   thus provide an alternate means to access information in the file.  This
   allows rapid sorted retrievals without the need to keep the data file
   sorted.

   The internal workings of an index is not really important, but you need to
   be aware of what an index contains and what operations can be performed
   using an index.  An index is primarily used to do retrievals using the
   value of one or more fields of a data record.  An index contains values
   which are the same as values of a field in a data file.  Each value in an
   index contains a corresponding logical record number which shows which
   logical record corresponds to that value.  In many instances, a field will
   not be unique.  This means that more than one record may have the same
   value for a given field.  The index will handle these properly.  However,
   it would not make sense to have multiple values for the same field for a
   single logical record number.  This would imply that a data record field
   could have two or more values at the same time, which it obviously cannot.
   This would occur only if you were updating a value for the given record
   number and failed to delete the old entry from the index.  The
   documentation contains an example of an update which will keep this from
   happening.  It is also important to realize that each index can handle only
   one data type. The valid data types are defined by the ValueType type.  The
   data type of the values in the index is specified at the time that the
   index is created and is never changed.

   The first step in using an index is to create one.  There is a routine
   (CreateIndexFile) especially for this purpose.  It will create the root and
   first sequence nodes as well as the internal bitmap.  This bitmap is used
   to keep track of which index nodes (physical records) are currently in use
   and which are available.  These bitmaps are maintained interally as part of
   the index file itself.  This alleviates the requirement to maintain
   separate bitmap files and greatly simplifies things.  When creating a file,
   you must supply the file name, the data type and the size of the entries.
   The path and the extension for the file name is optional.  For example, the
   path can be used to ensure that the files will always be in one place even
   though the program is run from other places, etc.  The extension can help
   in your own bookkeeping.  For example, all data files could use an
   extension of '.dat' and all index files could use '.idx' etc.  This is
   completely up to you and your application.

   Once you have created an index, you will want to put value/logical record
   number pairs into the index and retrieve them back out.  One routine
   (InsertValueInBTree) is supplied to insert a value and corresponding
   logical record number.  Likewise, there is one routine supplied to delete a
   value/logical record number pair.  There is no routine for performing an
   update.  Instead, you will perform an update by doing a delete followed by
   an insert.

   Retrieving the logical records number(s) corresponding to values is the
   primary purpose of the indexes and is discussed now.  There are actually
   two distinct ways to do this.  The more powerful method is to use the
   routines available in BTREE2.INC.  Each of these routines will perform an
   index search and will result in the creation of a logical record list. A
   logical record list is nothing more than a list of logical record numbers.
   These lists are fully explained in the LRECLIST.PAS file and are key to the
   power provided by TBTREE.  You will want to fully understand how to create
   and access these lists.

   A second way to access the index is to use an internal (to the index)
   cursor.  This cursor should not be confused with the cursor which is kept
   as part of the logical record lists discussed above.  These index cursors
   are discussed indepth in BTREE3.INC.

   To understand either method of retrieval, you need to have the proper
   framework for understanding the index.  Even though the indexes are
   maintained as BTREEs, you should view them as a sequential file of
   values/logical record number pairs.  This sequential file is ordered
   by value in ascending order.  Think of the pairs laid out horizontally with
   the lower values on the left.  Moving forward through the index means that
   larger values are accessed.  The first entry is the lowest value, etc.
   Hopefully, you get the picture.                                           *)


(*\*)
(* Version Information

   Version 1.1 - Corrected error in InsertValueInBTree routine.   I
                 was not checking for duplicate entries.  This has
                 been resolved.

               - Added GetRangeFromBTree routine

               - Added GetSubstringFromBTree routine

               - Divided the source code into 3 source files - BTREE.PAS
                                                               BTREE1.INC
                                                               BTREE2.INC
   Version 1.2 - No Changes

   Version 1.3 - Added NumberOfBTreeLevels routine

   Version 1.4 - HUGE CHANGE - I no longer keep the data file name as part of
                 the index file parameter record.  It was not used, and I
                 foresee no real use for it in the future.  I thought that I
                 would need it for my TBASE product, but I won't.  This will
                 mean a change to anyone who has applications that they have
                 written using TBTREE version 1.3 or earlier.  Any calls to
                 the old CreateIndex routine will need to be changed.
                 dataFName is no longer a parameter for that routine.
                 Hopefully, this will not create a great inconvenience.  I
                 just was tired of carrying around the extra baggage.

               - HUGE CHANGE - The bitmap files are a thing of the past.  The
                 bitmaps are now maintained as part of the index files
                 themselves.  This has caused a complete restructuring of the
                 index file parameter record and the index file themselves.
                 This means that any index files (BTREEs) created using
 !!!!! - >>>     versions prior to 1.4 will need to be rebuilt.  This is
 IMPORTANT       discussed in the documentation under the version information.
 !!!!! - >>>     I apologize for the inconvenience. However, the sooner I make
                 these kinds of changes, the better.  This will simplify your
                 use of the routines and cut in half the number of files to be
                 dealt with (since each index and data file previously had a
                 bitmap).

               - Corrected an error related to maximum value size (size of an
                 entry) for an index.  The documentation states that the
                 maximum size for an entry in an index is 494 bytes.  However,
                 an error limited this to 255 bytes.  This is because only one
                 byte was allocated in the parameter record for the value
                 size.  This has been corrected.  However, in practical terms,
                 you want to keep the index entries small.  This change
                 required a change to the parameter record definition.

               - Added a second way to retrieve information from the B-Tree
                 indexes.  Rather than always creating a logical record list
                 when retrievals are required, you can now work with the index
                 more directly.  The BTREE3.INC contains many routines which
                 manipulate an internal cursor which can be used to access
                 logical record numbers in the index.  The newly added
                 BTREE3.INC file has more details.

               - Added GetSubstringAtPositionFromBTree routine.  This routine
                 allows more flexibility when doing partial string matches on
                 an index.  It is exceptionally handy if you have an index on
                 a string field which really is made of several sub-fields.
                 Something like phone number is a good example.  You might be
                 interested in only phone numbers which have a seven as the
                 second digit, or whatever.

               - Made several changes internally to increase performance.  One
                 primary change was to use a Binary Search technique to find
                 entries in nodes at all levels.  Also, VAR parameters are
                 used internally when a parameter is large.  For instance,
                 when a page is passed to a routine as a parameter, it is
                 passed as a VAR parameter.  Therefore, the entire 512 bytes
                 do not have to be placed on the stack.  This will also reduce
                 the stack requirements.

               - Changed name of the CreateIndex routine to CreateIndexFile to
                 emphasize the change to the parameters for the routine.
                 Also, it clashed with a name which I want to use with TBASE.

               - Added DeleteIndexFile routine.  This routine should always be
                 used when deleting an index file.

   Version 1.5 - Changed code internally to use Inc and Dec where practical

               - Changed code internally to use newly added FastMove unit

               - Added UsingCursorAndGEValueGetLr routine

               - Changed the way records are added as the data file grows on
                 inserts.  As the file becomes larger,  the number of records
                 added at one time increases.  This will speed up the insert
                 process.

               - Added FindLrNumInBTree routine.  It may prove useful for
                 debugging or ensuring that an index is still in good shape.

               - Fixed error in GetEqualsFromBTree routine which would cause
                 random failures on queries where the Condition was EQ

               - Fixed error in GetRangeFromBTree routine which caused
                 problems when the upper range and lower range were the same

   Version 1.6 - Fixed an error which caused the wrong record(s) to be deleted
                 when using the DeleteValueFromBTree routine.  Error only
                 occurred under a rare set of circumstances, but will not
                 occur at all now.

               - I undid the third change of version 1.4 above.  In essence,
                 MAXVALSIZE is now equal to 245.  This represents the largest
                 number of bytes for an index entry.  The reasons for this are
                 somewhat complex, but each index node must be capable of
                 holding at least two entries.  There are 498 bytes available
                 (after 14 are used for overhead).  Any entry consists of the
                 value plus a 4 byte record number.  (245 + 4) * 2 = 498.
                 This is well past the practical limit anyway.  Keep the
                 entries as small as possible.  Anything over about 30 bytes
                 is not very desirable.  Remember, in a btree the number of
                 levels is related to the size of the entries as well as the
                 number of entries.  Large entries will cause the tree to be
                 extremely deep.  This will greatly reduce the performance.
                 To check the number of levels, use the NumberOfBTreeLevels
                 routine provided.                                           *)

(*\*)
(*////////////////////////// I N T E R F A C E //////////////////////////////*)

interface

uses
    Compare,
    FastMove,
    FileDecs,
    Files,
    Math,
    Numbers,
    LRecList,
    Page,
    Strings;

const
    MAXVALSIZE    = 245;                          (* max value size in index *)

type
    VSizeType     = 1 .. MAXVALSIZE;         (* size range for index entries *)


(*\*)
(* This routine will create an index file with the file name as specified
   by iFName.  The valSize parameter specifies the size of the index
   entries.  The easiest way to determine this is to use the SizeOf
   function.  The valType parameter specifies the type for the index
   entries.  The types supported are those enumerated by the ValueType
   enumerated type.

   note - Extremely important - WARNING - for STRINGVALUE indexes only - the
   valSize must be 1 greater than the number of characters of the longest
   string.  This will allow 1 byte for the string length to be stored.
   for example - if 'abc' is the longest string then valSize = 4.            *)

procedure CreateIndexFile(iFName : FnString;
                          valSize : VSizeType;
                          valType : ValueType);


(* This routine will delete an index file.                                   *)

procedure DeleteIndexFile(iFName : FnString);


(* This routine will insert a value and its associated logical record number
   into the given index file.  This routine will guard against duplicate
   entries. An index should have no more than one occurence of any
   lrNum,paramValue pair (no two entries match on paramValue and lrNum).  This
   routine assures this by calling DeleteValueFromBTree prior to performing
   the insert.  This will get rid of a previous occurence if it exists.      *)

procedure InsertValueInBTree(iFName : FnString;
                             lrNum : LRNumber;
                             var paramValue);


(* This routine will delete a value and its associated logical record number
   from a given index file.  If the logical record number passed in is zero
   then all records with a matching paramValue will be deleted.  Otherwise,
   only the entry with the matching paramValue and the matching logical record
   number will be deleted.                                                   *)

procedure DeleteValueFromBTree(iFName : FnString;
                               lrNum : LrNumber;
                               var paramValue);

(*\*)
(* This routine will start at the root node and return the number of levels
that exist in a BTree.  The index file name is the only required input.      *)

function NumberOfBTreeLevels(iFName : FnString) : Byte;


(* This routine will search an index and determine whether the given logical
   record number is in the index.  If it is, TRUE is returned in found and the
   value associated with the logical record number is returned in paramValue.
   If it is not found, found will be returned as FALSE and paramValue will
   remain unchanged.  This is primarily used for debugging or determining if
   an index has somehow been damaged.                                        *)

procedure FindLrNumInBTree(iFName : FnString;
                           lrNum : LrNumber;
                           var paramValue;
                           var found : Boolean);


(* This routine locates all values within the index which meet a condition
   (cond) and returns a list of logical record numbers corresponding to these
   values.

   note - In the case of  cond = EX (exists) all entries in the index will be
   returned.  paramValue is not really used in this case so anything (a dummy)
   can be passed in as the parameter corresponding to paramValue.            *)

procedure GetValuesFromBTree(iFName : FnString;
                             var paramValue;
                             cond : Condition;
                             var lrLst : LrList);


(* This routine locates all values within the index within a given range and
   returns a list of logical record numbers corresponding to values within
   that range.  The range is determined by paramValue1 and paramValue2.
   paramValue1 must be less than paramValue2.  cond1 and cond2 are used to
   determine if the range is inclusive or exclusive.  cond1 must be either
   GE or GT and cond2 must be either LE or LT for this to work.  If any of
   the above conditions is not true then an empty list will be returned.     *)

procedure GetRangeFromBTree(iFName : FnString;
                            var paramValue1;
                            cond1 : Condition;
                            var paramValue2;
                            cond2 : Condition;
                            var lrLst : LrList);

(*\*)
(* This routine locates partial string matches in an index of type
   STRINGVALUE.  A partial string match occurs when the string passed in as
   paramValue is contained within a string entry in the index.  You must
   specify where in the string the match must occur.  You accomplish this by
   using cond.  cond can be CO which stands for contains.  In this case, a
   match occurs if the string passed in as paramValue is anywhere in the
   string in the index.  Another option for cond is ST which stands for
   starts.  A match will occur in this instance if paramValue is located at
   the start of the string in the index.  The last option for cond is EN which
   stands for ends.  A match here occurs if paramValue matches the last
   characters in the index string.  Be aware that if paramValue and the string
   being checked are the same length and they match exactly, then a match
   would occur if any of the three options were selected (not earth shattering
   but true nonetheless).

   Otherwise, this works like any of the other retrieval routines.  A list of
   logical record numbers will be returned.

   This is exceptionally useful for many applications.  For example, a field
   might be a 7 digit code stored as a string (STRINGVALUE).  The last two
   digits might mean something in particular (part category, state, or
   whatever).  Using this you can look for all the matches on only the last
   two digits.

   One reality note here!! - For cond <> ST I have to look at every entry in
   the index for matches.  Why this is true should be reasonably obvious.
   Anyway, for cond = EN or cond = CO it will be somewhat slower than for cond
   = ST.  How much slower depends on the size of the index.  It boils down to
   the difference between a O(LOG(n)) algorithm and a O(n) algorithm, the
   former being much faster for n = a large number.  If any of the above is
   confusing ignore it and experiment.  It is still better to use this routine
   than to grovel through all of the data records yourself!!                 *)

procedure GetSubstringFromBTree(iFName : FnString;
                                var paramValue;      (* must be a string var *)
                                cond : StringCondition;
                                var lrLst : LrList);

(*\*)
(* This routine is exactly like GetSubstringFromBTree except that matches only
   occur if the substring passed in as paramValue is located at the exact
   location specified by the position parameter.  Also, the cond parameter is
   omitted since it only makes sense to look for a partial string match at a
   particular position if cond is CO.  In other works, you will get a logical
   record list with entries for all index values which contain paramValue at
   the character position specified by position.                             *)

procedure GetSubstringAtPositionFromBTree(iFName : FnString;
                                          var paramValue;   (* must be a string
                                                                         var *)
                                          position : StringLengthRange;
                                          var lrLst : LrList);


(* This routine will set the tree cursor to the front of the index.  In
   other words, it will point to the first entry in the index.  Remember, the
   index is ordered by the value of each entry.  It will also return the
   logical record associated with the first entry in the index.  It will
   return 0 only if there is no first entry (the index is empty).  This
   routine should be called if you want to start at the beginning of an index
   and want to retrieve logical record numbers in order of entry.            *)

function UsingCursorGetFirstLr(iFName : FnString) : LrNumber;


(* This routine will set the tree cursor to the end of the index.  In other
   words, it will point to the last entry in the index.  Remember, the index
   is ordered by the value of each entry.  It will also return the logical
   record associated with the last entry in the index.  It will return 0 only
   if there is no first entry (the index is empty).  This routine should be
   called if you want to start at the end of an index and want to retrieve
   logical record numbers in order of entry.                                 *)

function UsingCursorGetLastLr(iFName : FnString) : LrNumber;


(* This routine will set the tree cursor to the location in the index where
   the first occurence of the desired value (paramValue) is located.  It will
   also return the logical record associated with this entry. It will return 0
   if there is no entry associated with this value.  This routine should be
   called if you want to start at a certain location (at a certain value)
   within the index and want to retrieve logical record numbers in forward or
   reverse order.                                                            *)

function UsingCursorAndValueGetLr(iFName : FnString;
                                  var paramValue) : LrNumber;

(*\*)
(* This routine is the same as UsingCursorAndValueGetLr except that this
   routine will set the tree cursor to the location of the first value in the
   index which is greater than or equal to paramValue.  It will also return
   the logical record associated with this entry.  It will return 0 if there
   is no entry which is greater than or equal to this value.                 *)

function UsingCursorAndGEValueGetLr(iFName : FnString;
                                    var paramValue) : LrNumber;

(* This routine will move the cursor to the right one entry and return the
   value associated with this entry.  It will return 0 if the cursor was not
   valid (not pointing to an entry) or if there is no next entry (you are at
   end of index).  This routine should be called if you want to move the
   cursor to the next larger entry from the present cursor position and
   retrieve the associated logical record number.  This routine should not
   normally be used until the cursor has been positioned using one of the
   three previous positioning routines.                                      *)

function UsingCursorGetNextLr(iFName : FnString) : LrNumber;


(* This routine will move the cursor to the left one entry and return the
   value associated with this entry.  It will return 0 if the cursor was not
   valid (not pointing to an entry) or if there is no previous entry (you are
   at beginning of the index).  This routine should be called if you want to
   move the cursor to the next smaller entry from the present cursor position
   and retrieve the associated logical record number.  This routine should not
   normally be used until the cursor has been positioned using one of the
   three previous positioning routines.                                      *)

function UsingCursorGetPrevLr(iFName : FnString) : LrNumber;


(* This routine will move the cursor to the right.  It will move the cursor
   to the next entry in which the value is not equal to the current entry and
   return the associated logical record number.  In other words, it will skip
   the cursor over all matching values.  It will return 0 if the cursor was
   not valid (not pointing to an entry) or if there is no next entry (you are
   at beginning of the index).  This routine should be used if you only want
   to process the first entry of a given value etc.  This routine should not
   normally be used until the cursor has been positioned using one of the
   three previous positioning routines.                                      *)

function UsingCursorSkipAndGetNextLr(iFName : FnString) : LrNumber;

(*\*)
(* This routine will move the cursor to the left.  It will move the cursor to
   the previous entry in which the value is not equal to the current entry and
   return the associated logical record number.  In other words, it will skip
   the cursor over all matching values.  It will return 0 if the cursor was
   not valid (not pointing to an entry) or if there is no previous entry (you
   are at beginning of the index).  This routine should be used if you only
   want to process the first entry of a given value etc.  This routine should
   not normally be used until the cursor has been positioned using one of the
   three previous positioning routines.                                      *)

function UsingCursorSkipAndGetPrevLr(iFName : FnString) : LrNumber;


(* This routine will not move the cursor.  It will return the logical record
   number asociated with the current cursor position.  It will return 0 only
   if the current cursor position is not valid.                              *)

function UsingCursorGetCurrLr(iFName : FnString) : LrNumber;


(* This routine will set the cursor to invalid.  This is never required,
   but can be used once the cursor use is completed and the cursor won't be
   used until it is repositioned using one of the three positioning
   routines. Using this routine will slightly speed up inserts and deletes.
   This is because, on an insert or delete, the cursor position must be
   kept correct if the cursor is valid.  This requires a small amount of
   extra processing.  This processing is extraneous if you don't care about
   the cursor position.                                                      *)

procedure UsingCursorMakeCursorInvalid(iFName : FnString);



(*!*)
(*\*)
(*///////////////////// I M P L E M E N T A T I O N /////////////////////////*)

implementation

{$I btree1.inc}         (* most of the implementation part of the btree unit *)

{$I btree2.inc}         (* the rest of the implementation of the btree unit  *)

{$I btree3.inc}         (* the newly added cursor routines                   *)

begin
mustMoveCursor := FALSE;
end.                                                    (* end of BTree unit *)
