OSE - C++ Library User Guide

Graham Dumpleton
Dumpleton Software Consulting Pty Limited
PO BOX 3150
Parramatta, 2124
N.S.W, Australia
email: grahamd@nms.otc.com.au

Table of Contents

Strings and Symbols
OSE - C++ Library User GuideStrings and Symbols

Strings and Symbols

1 Replacements for C Strings

Although C and C++ allow strings of characters to be stored through use of the `char*' type, it is a long way from having a type for strings which is directly supported by the language. With C++ we can solve a lot of the problems which occur when using the `char*' type, or C strings as they are often called, by encapsulating operations which we would like to perform on character strings into a class.

There are two principal classes provided for use in place of C strings. These classes are OTC_String and OTC_Symbol. Both classes provide you with the ability to store character strings containing embedded null characters. Both will also place a null guard byte at the end of the string, in case there is no explicit null terminator. The addition of the null terminator allows the string to be used where C strings would have been used.

The OTC_String class provides you with a range of editing and query operations, and utilises a number of techniques, such as delayed copying, to improve performance and reduce memory usage. The OTC_Symbol class is tailored to applications where a fairly constant number of strings are used as identifiers for objects. To cut down on the number of strings, the OTC_Symbol class maintains an internal database of all symbols which have been created. This ensures there will only ever be one copy of a symbol, allowing for reuse and a potential reduction in the amount of memory used. Keeping only one copy of the string associated with the symbol also allows efficient equality tests for symbols.

2 Creating Strings and Symbols

Strings and symbols can be declared as static or automatic variables, as member variables of other classes, or using `operator new()', created on the free store. If a string or symbol is not initialised at the time of creation, its initial value will be that of an empty string. The OTC_String and OTC_Symbol classes may be initialised using a single character, a character pointer, an instance of the same type, each other, or any expression yielding one of these types. For example:

  char* p = "abc";

OTC_String a; // a is ""

OTC_String b = "abc"; // b is "abc"
OTC_String c = p; // c is "abc"
OTC_String d(p); // d is "abc"

OTC_String e(p,2); // e is "ab"
OTC_String f(p,4); // f is "abc\0"

OTC_String g = d; // g = "abc"
OTC_String h(g,2); // h = "ab"

OTC_String i = 'a'; // i = "a"
OTC_String j('a'); // j = "a"
OTC_String k('a',2); // k = "aa"

OTC_Symbol l = d; // l = "abc"
OTC_String m = l; // m = "abc"
When C strings are used, a null pointer is often used to mean an undefined string or as an indication that there is no more data available. This meaning for a null pointer can be quite different to a pointer to a valid, but empty string. When using an instance of OTC_String no pointers are involved, thus it is not possible to have this distinction. So as to still allow for the concept of an undefined string, when a string object is created but not initialised, as well as being an empty string, it is also identified as being undefined. Whether a string is undefined can be determined using the `isUndefined()' member function. To determine if a string is empty, the `isEmpty()' member function is used.

  OTC_String a;
res = a.isEmpty(); // yields OTCLIB_TRUE
res = a.isUndefined(); // yields OTCLIB_TRUE

OTC_String b = "";
res = b.isEmpty(); // yields OTCLIB_TRUE
res = b.isUndefined(); // yields OTCLIB_FALSE
If you require an empty string, which also needs to be identified as being an undefined string, but you cannot create one within the context you require it, you can obtain one using the static member function `OTC_String::undefinedString()'. The value returned from this function can also be assigned to a string to put it back into the undefined state. For example:

  OTC_String a = "";
res = a.isEmpty(); // yields OTCLIB_TRUE
res = a.isUndefined(); // yields OTCLIB_FALSE

a = OTC_String::undefinedString();
res = a.isEmpty(); // yields OTCLIB_TRUE
res = a.isUndefined(); // yields OTCLIB_TRUE
Similar functions to `OTC_String::undefinedString()' are provided in both the OTC_String and OTC_Symbol class for creating defined, but empty strings. These are `OTC_String::nullString()' and `OTC_Symbol::nullSymbol()'.

3 String Length and Capacity

When a string is created and initialised, the length of the string is governed by the data which is being used to initialise it. The length of the string can be determined at any time by using the `length()' member function. If it is necessary for you to create a string of a particular length, but for the string to be left uninitialised, a special constructor must be used. The syntax necessary to invoke this constructor is:

  OTC_String a(OTC_Length(128));
If it is necessary to change the length of the string after it has been created, an overloaded version of the `length()' member function accepting a single size argument should be used. This member function will allow you to either increase or decrease the length of the string. In the case of the length of the string being increased, the new area of the string made available will not be initialised. The `length()' function is different to the `truncate()' function in that the latter function only allows you to decrease the length of the string.

A common idiom for use of this feature, is where immediately upon creation, the string is to be filled with data which is being read from a file or socket. This is illustrated below.

  OTC_String a(OTC_Length(128));
res = read(fd,a.buffer(),a.length());
if (res != -1)
a.length(res);
else
a.length(0);
When the string class allocates memory for holding a character string, it uses a buffering mechanism such that more space than what is actually required is allocated. This approach is used so as to limit the number of reallocations of memory which have to be made when the string is extended in increments. If you know in advance what the longest length is that the string will grow to in a series of operations, you can force the string to preallocate this capacity so as to avoid any reallocations as the string grows in length.

To set the capacity of the string at the time of creation, another special constructor is used. The syntax necessary to invoke this constructor is:

  OTC_String a(OTC_Capacity(1024));
To determine the capacity of the string at any time, the `capacity()' member function can be used. To change the capacity of string after it has been created, an overloaded version of the `capacity()' member function accepting a single size argument is used. Note that when requesting the current capacity of the string, it will not always indicate a size the same as you specified. This is because the size you specify is only a recommendation. The implementation may allocate more space than what you have specified. In addition, if the capacity of the string was already greater than what you had indicated that you needed, the string is not reallocated, but the memory already retained is used.

A variation on the above constructors allows you to specify both the length of the string and also the capacity of the string at the time of creation. As before, when specifying the length at creation, the string will be unitialised. It is assumed that the capacity you specify will be greater than the length. If you specify a capacity less than the length there will be an immediate reallocation. The syntax for specifying both the length and capacity when creating a string is:

  OTC_String a(OTC_Length(128),OTC_Capacity(1024));
If you are creating a string which you know will never be modified, you can force the capacity of the string to be only that necessary to hold the character string. If the string is long lived, this will reduce the amount of memory which will be unavailable for use elsewhere in the program. Forcing the capacity to only that which is necessary, is performed using the syntax:

  static OTC_String const a(OTC_CString("abcd"));

res = a.length(); // yields 4
res = a.capacity(); // yields 4
The constructor of the OTC_CString class accepts either a pointer to a character string or a single character, and as appropriate, an optional string length or character count.

4 Access to the Character String

In order to allow instances of the OTC_String and OTC_Symbol classes to be used where a C string was used, it is possible to access the underlying character string associated with an instance of either class. In case there is no explicit null terminator to indicate the end of the string, the implementation will supply one. This null terminator will lie just beyond the end of the string and is not counted in the length of the string.

To access the underlying character string for an instance of either class, the `string()' member function is used. This member function can be used on either const or non-const instances of each class. The return type of the function is `char const*'. If the string or symbol represents an empty string, a pointer to an empty, null terminated character string is returned.

In addition to the `string()' function, the OTC_String class provides an automatic conversion operator for the `char const*' type. This allows an instance of the OTC_String class to be used explicitly in situations where the need for a conversion to the `char const*' type can be deduced by the compiler. For example:

  void print(char const*) { ... }
void printf(char const*, ...) { .. }

OTC_String a = "abcd";
print(a);
printf("%s",a.string());
One case where you need to be careful is shown above. That is, when passing an instance of OTC_String to a function accepting a variable number of arguments, you must use the `string()' member function. If you do not do this, the address in memory of the string or symbol object will be passed to the function. This will occur as the compiler cannot deduce that it should apply the conversion operator.

For the OTC_String class, as the return type of the `string()' member function is `char const*', it is not possible to directly modify the contents of the underlying character string through the pointer which that function returns. If you need to modify the underlying character string directly, you should instead use the `buffer()' member function. This member function behaves differently to the `string()' member function in a number of important ways.

The most important difference between the `buffer()' member function and the `string()' member function is that the `buffer()' member function will ensure that the underlying character string is only in use by that instance. That is, if the underlying character string is being shared due to the delayed copy mechanism, the sharing of the character string will be broken before the pointer to the character string is returned.

The second difference in the behaviour of the `buffer()' member function, is that when the length of the string is zero, a null pointer is returned. For the `string()' member function, a pointer to an empty, null terminated string, would be returned. For a const instance of OTC_String, the return type of the `buffer()' member function is `char const*' and thus you cannot modify the underlying character string. A null pointer is still returned though, in the case that the string has zero length.

The existence of the two functions derives from when OTC_String did not allow embedded null characters and a separate class, OTC_Buffer, existed to fulfil that role. To allow easy conversion to the new OTC_String class from the older classes, the existing names and behaviour were preserved. Note that it is not possible to modify the underlying character string of an instance of OTC_Symbol.

In the case of OTC_String, you should not retain the pointer returned by either the `string()' or `buffer()' functions for longer than is necessary for the specific operation you require it. This is necessary, as a subsequent alteration of the string through any of its member functions, may necessitate the underlying character string being reallocated. Such an operation would result in the pointer you hold no longer being valid.

In addition to the problem of the character string being disposed of through an increase in the length of the string, there is also the danger of the string object being destroyed while you still have a pointer to the underlying character string. For example:

  OTC_String function() { return "abcd"; }

char const* p;

// ...

if (p == 0)
{
p = function();
}

// p is invalid
Here a function is returning a string object. As there is an assignment to a variable of type `char const*' a temporary is created to hold the string object so that the conversion operator can be applied. The string object is destroyed at the end of the `if' statement, meaning that `p' will subsequently be invalid as the underlying character string would also have been destroyed.

If you need to ensure that a pointer to the underlying character string will be valid, you should take a copy of the character string and assume responsibility for deleting it. A copy of the underlying character string can be obtained using the `duplicate()' member function.

  OTC_String function() { ... }

char const* p;

// ...

if (p == 0)
{
p = function().duplicate();
}

delete [] p;
When writing directly to the underlying character string, it is your responsibility to ensure you do not write beyond the valid length of the string. If you were to write beyond the valid length, the null terminator could be overwritten. The null terminator is only updated when it is necessary to reallocate the underlying character string. In other words, no attempt is made to replace the null terminator if you were to destroy it, neither are any checks made to ensure it exists when the underlying character string is accessed in order to pass it some context where a C string is required.

5 Access to Individual Characters

In both the OTC_String and OTC_Symbol class, individual characters can be accessed using conventional array access notation. For example:

  OTC_String a = "abcd";

int len = a.length();
for (int i=0; i<len; i++)
cout << a[i];
cout << endl;
In the case of a non const instance of the OTC_String class, it is possible for the reference to a character within the array to appear on the left hand side of an assignment. This is achieved through `operator[]()' returning a reference to the actual location for the character within the underlying character string. Due to the possibility of assigning into a specific location of the character string by using the array notation on the left hand side of an assignment, any delayed copy the string is participating in will be broken before the reference to the location in the array is returned. For example:

  OTC_String a = "abcd";
OTC_String b = a;

int len = b.length();
for (int i=0; i<(len/2); i++)
{
char c = b[i];
b[i] = b[len-i-1];
b[len-i-1] = c;
}

cout << a << endl; // yields "abcd"
cout << b << endl; // yields "dcba"
For the same reasons as with the `string()' and `buffer()' member functions, you should not keep the reference beyond the immediate operation due to the possibility of the underlying character string being reallocated by a subsequent operation. You should also note that if you are only going to use the array access notation to read characters, you should ensure the string is accessed as a const object. Doing so will avoid additional memory being allocated due to the need to break a delayed copy. This is illustrated below, where the underlying character string for `tmpString' would be reallocated due to the possibility that assignment could be made into the string using the array notation.

  void print(OTC_String const& theString)
{
OTC_String tmpString = theString;
int len = tmpString.length();
for (int i=0; i<len; i++)
cout << tmpString[i];
}
The version of the function below, ensures that the string is accessed as const object, thus avoiding the possibility of additional memory having to be allocated.

  void print(OTC_String const& theString)
{
OTC_String const tmpString = theString;
int len = tmpString.length();
for (int i=0; i<len; i++)
cout << tmpString[i];
}
When accessing individual characters in the array, it is only possible to access characters within the valid length of the string. That is, it is not possible to access the null terminator which is placed immediately after the active length of the string. An attempt to access beyond the valid length of the string will result in an exception being raised.

Because of the bounds checking which is performed against the active length of the string, using `operator[]()' for both the OTC_String and OTC_Symbol class is not as efficient as accessing a normal character string. If you need to perform a number of operations by accessing individual characters, and access needs to be efficient, you should get a pointer to the underlying character buffer and access the characters through that pointer. For example:

  OTC_String a = "abcd";

char* s = a.buffer();
int len = a.length();
for (int i=0; i<(len/2); i++)
{
char c = s[i];
s[i] = s[len-i-1];
s[len-i-1] = c;
}
If you were only going to be reading characters, use the `string()' member function instead of the `buffer()' member function. Again, this is to avoid the possibility that additional memory will need to be allocated.

6 Strings as Function Arguments

As the OTC_String class is not a trivial object, you should not pass it as an argument to a function in the same way as you would a built-in type. That is, rather than functions accepting string objects, you should write them so as to take a reference to a string object. For example:

  void print(OTC_String const& theString) { ... }
The presence of the `const' qualifier says that the function will not modify the contents of the string. If it was intended that the function should modify the string and the change reflected back in the string object passed as argument, the `const' qualifier would be dropped. Although the `const' qualifier would be dropped in this latter case, a reference would still be necessary.

As the compiler will apply any necessary conversions, or invoke an appropriate constructor for OTC_String to get the right type for the function, it is possible to pass to the function a normal C string, or any single argument for which a conversion to OTC_String exists, or for which OTC_String has a constructor. As an example, the following are all legal arguments for the `print()' function.

  OTC_String a = "abcd";
OTC_Symbol b = a;
char const* c = "abcd";

print(a);
print(b);
print(c);
print("abcd");
print(a+b);

7 Strings as Return Values

When it is necessary to return a string from a function, as with function arguments, you should attempt to use a reference to a string object, as opposed to an actual object. This however is not always possible and in some circumstances you may need to return an actual string object.

Cases where a reference can be used, is where a copy of a member variable of a class, or a global or static string object is being returned. This is okay as the string object to which you are returning a reference, has a longer lifetime than the scope of the function which is returning it. The most common scenario is that of an accessor function.

  class Item
{
public:
OTC_String const& string() const
{ return myString; }

private:
OTC_String myString;
};
Where a reference cannot be used, is where the string object being referred to would have been deleted by the time it was used. The primary case where this occurs is where the string object to be returned is located on the stack of the function it is being returned from, or exists as a temporary object within that function. These two cases are illustrated below.

  OTC_String const& function()
{
OTC_String theString = "abcd";
return theString;
}

OTC_String const& function()
{
char* theString = "abcd";
return theString;
}
Both cases can be resolved by changing the return type of the function to that of a string object. For example:

  OTC_String function()
{
OTC_String theString = "abcd";
return theString;
}
For the case of literal strings being returned, if the literal strings are predefined and do not change, local static string objects could instead be used.

  OTC_String const& function()
{
static OTC_String const theString(OTC_CString("abcd"));
return theString;
}
A further scenario which often arises is where for all but one case a valid reference to a string object can be returned. For the one case where a valid reference cannot be returned, an empty string must be returned. This may happen where a string, after a search, is being returned from a collection, but when the search fails an empty string must be returned. For this case, the static member function `OTC_String::undefinedString()' can be used to obtain an empty string.

  OTC_String const& search(int key)
{
if (map.contains(key))
return map.item(key);

return OTC_String::undefinedString();
}
If an empty string was a valid string for those being searched, the caller would need to check explicitly for an undefined string by using the `isUndefined()' function. If an empty string was not valid, a simple check for an empty string would be adequate. In the case where an empty string was valid, the routines entering strings into the collection being searched, would need to ensure an undefined string was not added and that it was replaced with a defined but empty string.

  void add(int key, OTC_String const& theString)
{
if (!map.contains(key))
{
if (theString.isUndefined())
map.add(key,OTC_String::nullString());
else
map.add(key,theString);
}
}
The issues above for using strings as function arguments and return values do not arise for symbols. This is because the copy constructor for OTC_Symbol is trivial and it can always be passed as an object. References can still be used if desired and if done the above issues for return values will again apply. Note though that for OTC_Symbol, there is no equivalent of the static member function `OTC_String::undefinedString()'.

8 C++ Streams Input and Output

Both the OTC_String and OTC_Symbol classes can be output to an ostream using the insertor `operator<<()'.

  OTC_String a = "abcd";
cout << a << endl;
If the underlying character string has embedded null characters, these will be correctly displayed. All the options of ostream for setting field widths and justification to format the output are honoured.

When reading input from an istream, it is only possible to read into an instance of the OTC_String class. If you need to initialise an instance of OTC_Symbol using input from an istream, you will first need to read into an instance of OTC_String or a character string and initialise the instance of OTC_Symbol through either assignment or a constructor.

One means of reading input from an istream into an instance of OTC_String is the extractor `operator>>()'.

  OTC_String a;
cin >> a;
The extractor `operator>>()' will read in characters from an istream up till the point at which whitespace occurs. Provided that skipping of whitespace is not disabled, any leading whitespace will first be skipped. If a field width is set for the istream, that will be the maximum number of characters which are read. The string will expand as necessary to accommodate whatever number of characters are read.

Functions equivalent to the `get()' and `getline()' member functions of istream are provided in the form of the static member functions `OTC_String::get()' and `OTC_String::getline()'. As with the istream versions, the default delimiter is the end of line character. Unlike the functions of the istream class, it is not necessary to specify an upper limit as to how many characters should be read, the string will automatically grow to accommodate whatever characters are read. The `get()' and `getline()' functions for OTC_String come in two varieties.

The first variant of these functions, returns a new string object. For example:

  OTC_String a = "LINE: ";
a += OTC_String::getline(cin);
The string object returned should be assigned to an existing string, used to initialise a string at the point of creation, used to initialise a reference, or used in any operation where a string is acceptable.

If you are repetitively appending the new string object to another string, the second variant of these functions may be more appropriate. These functions take as first argument a string object to which the result should be appended. The functions will read directly into the string you supply, avoiding the need to create a separate string.

  OTC_String a = "LINE";
OTC_String::getline(a,cin);
The value return from these functions, is a reference to the string object passed as the first argument.

As for the equivalent member functions of istream, the difference between `get()' and `getline()' is that `getline()' removes the delimiter from the input stream. In accordance with conventions for writing functions manipulating an istream, the state of the stream will be set by the functions to indicate whether the end of input or bad input was encountered. Using the state flags, code which reads all lines of input would be written as:

  while (cin.good())
{
OTC_String aString = OTC_String::getline(cin);
if (!cin.fail())
cout << aString << endl;
}
If instead of reading in input up till a set delimiter, you need to read in a set number of characters, the static member function `OTC_String::read()' can be used. This also comes in the two forms described above. When using this function, you will need to check the length of the string after the call to see if the number of characters which you expected to be read in, were actually read in. For example:

  OTC_String a;
OTC_String::read(a,cin,128);
if (a.length() != 128)
...

9 Hash Values and Ranking

If a hash value is needed for an instance of either the OTC_String or OTC_Symbol classes, the `hash()' member function should be used. The hash value generated for the OTC_String class uses the contents of the underlying character string. Therefore the time taken to calculate the hash value is proportional to the length of the string.

As the underlying character string for equivalent instances of OTC_Symbol will be the same, the hash value is generated from the location in memory of the underlying character string. As the time taken to calculate the hash value for an instance of OTC_Symbol is constant, it is a better choice than OTC_String for use as a key in OTC_Map, OTC_UniqMap, or as an item in an OTC_Set or OTC_Bag. You will however have to remember that for OTC_Symbol an internal database of symbols is kept, thus if the range of symbols were unbounded, the database would keep growing and memory would not be reclaimed.

To allow OTC_String and OTC_Symbol to be used as keys in an OTC_Map, OTC_UniqMap, or as items in an OTC_Set or OTC_Bag, override versions of the OTC_HashActions class are provided. The override versions of OTC_HashActions are equivalent to calling the `hash()' function for the respective classes and you do not need to do anything in order to use OTC_String and OTC_Symbol in the collection classes where OTC_HashActions is used.

To determine the order or rank of two instances of either OTC_String or OTC_Symbol, the `rank()' member function should be used. The order of instances of OTC_String is based on a lexicographic comparison and thus can take time proportional to the length of the string. The order of instances of OTC_Symbol is based on the location in memory of the underlying character string, with time take to perform this being constant. Override versions of OTC_RankActions are provided for each class to facilitate use of these classes in collection classes which need to rank items. The override versions of OTC_RankActions have the same effect as using the `rank()' member functions.

10 Underlying Raw String

Both OTC_String and OTC_Symbol use in their internal implementations the class OTC_RString. It is in this class where the delayed copy and buffering mechanisms are implemented. As both OTC_String and OTC_Symbol use this class, the delayed copy mechanism can be applied in some circumstances when assignments occur between instances of each of these classes. Where the delayed copy mechanism can be applied is where the initial object is an instance of OTC_Symbol. For example, the delayed copy mechanism would be used in the following assignments.

  OTC_Symbol a = "abcd";
OTC_String b = a;
OTC_Symbol c = b;
The delayed copy mechanism can be used when going from OTC_String to OTC_Symbol in this case as the OTC_RString class keeps a flag indicating if it represented a symbol. Because of the flag, there was no need to look the string up a second time in the symbol database. Overall, the delayed copy mechanism reduces the amount of copying which needs to occur when strings and symbols are being initialised from or assigned to each other.

The underlying raw string can be accessed for an instance of OTC_String through using the `rawString()' member function. The only reason to access the underlying raw string is when using OTC_SObject. At other times it is best to use the high lever member functions of the OTC_String class.

The default buffering mechanism used by OTC_RString is to always allocate memory in multiples of 16 bytes by rounding up the required size. The intention of this is to reduce the number of allocations which need to occur when minor changes, such as single character additions, are being made to strings. As has been described you can get some measure of control over the buffering using the `capacity()' member function of the OTC_String class.

If you wish to turn off the buffering mechanism which is used such that only enough memory is ever allocated for a string, you can set the environment variable OTCLIB_NOSTRINGBUFFERING prior to executing your application. It is advisable to disable the buffering mechanism when using memory diagnostic tools such as Purify and Sentinel, as this will allow a memory overrun to be more easily detected.

In the default buffering mechanism, there will never be more that 16 additional bytes in the character string to allow for expansion. For some applications, this strategy may not be suitable. An alternate buffering mechanism can be enabled by defining the environment variable OTCLIB_MOWBRAYBUFFERING before executing your application. When this mechanism is enabled the amount of memory allocated will be either 16, 64, 256 or 1024 bytes. Beyond 1024 bytes, a set percentage of the required memory is allocated in addition to that required.

The OTC_RString class provides various member functions to implement the above facilities and to allow access to the character string it holds. Except for use as handle in conjunction with OTC_SObject, the OTC_RString class should be regarded as an implementation class. That is, you shouldn't rely on the naming or functionality of the member functions it provides.

11 Symbol Database

Each time that an instance of OTC_Symbol is created, a central database of symbols is consulted to see if that symbol had already been used. If the symbol has already been used, the new instance is made to refer to the existing entry for that symbol. If the symbol had not previously existed, a new entry is added to the symbol database and the new instance made to point at this new entry.

Once an entry for a particular symbol has been created in the symbol database, the entry will remain in the symbol database until the application terminates. When ObjectStore is being used the symbol database is kept in transient memory. This means that you cannot store instances of OTC_Symbol as persistent objects and expect them to be valid in a subsequent invocation of the application with that database. If using ObjectStore, you would need to convert the symbol back to a string and store the string in the database.

Because the string which a symbol represents is kept for the life of the programs execution, you should only use the OTC_Symbol class where the number of different symbols is known is advance and is not excessive. If you use OTC_Symbol for arbitrary strings, the range of which was unbounded, the symbol database could keep growing in size and you may eventually exhaust the available memory.

If you need to know if a symbol has ever been used, the static member function `OTC_Symbol::exists()' can be used. The function accepts a pointer to a C string, and an optional length. If no length is supplied, the characters up to but not including a null character are used.

12 String Type Objects

Often it occurs that you can have a class which is essentially a synonym for a string. In addition, you want to be able to pass an instance of this class where ever an instance of OTC_String is accepted. This can be achieved by deriving your class from OTC_String. This however is not an adequate solution where the string should not be able to be modified, or there are constraints on what the string can contain. The solution is not adequate, as for a non const instance of the class, a user of the class would be able to arbitrarily change the string through the member functions of the class OTC_String being used as a base class.

An alternative solution is to have an instance of OTC_String as a member variable of your class. You can now provide a restricted range of member functions for accessing or modifying the string. In order to be able to pass an instance of your class as argument where ever an instance of OTC_String is accepted, you must now supply a conversion operator for OTC_String. For example:

  class Item
{
public: operator OTC_String const&() const
{ return myString; }

private:
OTC_String myString;
};
An ability which has been lost in this solution, is that you cannot make use of the feature of OTC_String which allows an instance of OTC_String to be used where the type `char const*' is required.

  OTC_String a = "abcd";
Item b = "abcd";
char const* p;

p = a; // okay
p = b; // invalid
This ability cannot be added back into your class as the addition of a second conversion operator for the type `char const*' will introduce ambiguities where an instance of your class is used in places where either an OTC_String or the type `char const*' is acceptable.

To achieve the desired affect, the class OTC_SObject can be used as a base class to your class. The class OTC_SObject is a special abstract base class providing a pure virtual function for getting at the string representation of a type. By virtue of a constructor of OTC_String, deriving your class from OTC_SObject will allow you to pass an instance of your class as argument in nearly all circumstances where OTC_String is acceptable.

If the ability to use your class where the type `char const*' is accepted is necessary, a conversion operator for that type can now be added to your class. For example:

  class Item : public OTC_SObject
{
public:
operator char const*() const
{ return myString.string(); }

private:
OTC_RString rawString() const
{ return myString.rawString(); }
OTC_String myString;
}
The virtual function of OTC_SObject being overridden is `rawString()'. In implementing this, it is necessary to go directly to the OTC_RString type which is used in implementing the OTC_String class.

Conversion to both OTC_String and the type `char const*' are now possible. In cases where the possibility of both conversions exist, the compiler should use the conversion to the base class in preference to the explicit conversion operator. That is, where both conversions are possible, the function accepting an instance of OTC_String should be that which is used.

13 Searching of Strings

Individual characters, or a substring may be searched for within an instance of OTC_String. A search for an individual character may start from either the front or end of the string. A particular occurrence may also be nominated. For example, you may ask for the second occurrence of the character `:' when searching backwards from the end of the string. When searching in a forward direction for either characters or substrings, a starting index may also be specified.

  OTC_String a = "abcd";
res = a.index('b'); // yields 1
res = a.index('z'); // yields -1
res = a.index(2,'b'); // yields -1

OTC_String b = "aabbccdd";
res = b.index('c',2); // yields 4
res = b.index("bb"); // yields 2
When a search fails, a value of `-1' is returned.

If you wish to perform searching or pattern matching based on regular expressions, you should use either of the OTC_Globex, OTC_Regex and OTC_Regexp classes in conjunction with OTC_String. A current limitation of these classes though, is that they cannot handle null characters embedded within a string.

14 Extracting Sections of Strings

Creating a string which is a subsection of another string can be achieved using the constructors of OTC_String. For example:

  OTC_String a = "abcd";
OTC_String b(a.string()+1,2); // yields "bc"
An easier approach is to use member functions of the OTC_String class. The first such function is `section()'. This works in a similar manner to the above example.

  OTC_String a = "abcd";
OTC_String b = a.section(1,2); // yields "bc"
OTC_Range r(1,2);
OTC_String c = a.section(r); // yields "bc"
The arguments to the `section()' member function represent the range of the string you are after. The range can be specified using a starting index and length, or using the OTC_Range class.

The inverse of the `section()' member function is `except()'. This member function returns everything in the string except the characters in the specified range.

  OTC_String a = "abcd";
OTC_String b = a.except(1,2); // yields "ad"
Sections of strings can also be obtained by giving a position in the string. Depending on the function used, either the section before or after the position specified is returned. Whether the returned string should or should not include the position specified is also determined by the particular member function which is used.

  OTC_String a = "abcd";

OTC_String b = a.after(1); // yields "cd"
OTC_String c = a.from(1); // yields "bcd"
OTC_String d = a.before(1); // yields "a"
OTC_String e = a.through(1); // yields "ab"
Finally, the region of a string lying between two indices can be obtained using the `between()' member function. That is, both arguments are indexes, they do not describe a range but define the bounds of the region you want.

  OTC_String a = "abcd";
OTC_String b = a.between(1,3); // yields "c"
The characters at the positions given are not included in the string which is returned.

Note that for all the member functions described above, a new string is returned, the original string is not modified. Also be aware that no delayed copying mechanism is used when substrings are being created.

15 String Editing

The OTC_String class provides a range of editing functions. All these functions utilise the functionality of the `replace()' member function. This member function accepts arguments representing a range within a string and what that part of the string should be replaced with. Ranges may be specified in two ways. A range may be specified by a starting index and a length, or by using the OTC_Range class. For example:

  OTC_String a = "abcd";
a.replace(1,2,"cb"); // yields "acbd"
OTC_Range r(1,2);
a.replace(r,"bc"); // yields "abcd"
All the editing operations for OTC_String use the `replace()' member function, each supplying a different value for the range. Appending to the beginning of a string is indicated by setting the range to start at location `0' and have length `0'. This functionality is provided by the `prepend()' member function.

  OTC_String a = "abcd";
a.replace(0,0,"cd"); // yields "cdabcd"
a.prepend("ab"); // yields "abcdabcd"
Appending to the end of a string is indicated by setting the range to start at the location given by the length of the string and have length `0'. This functionality is provided by the `append()' member function.

  OTC_String a = "abcd";
a.replace(a.length(),0,"ab"); // yields "abcdab"
a.append("cd"); // yields "abcdabcd"
The `append()' member function is also available in the form of the overloaded operator `operator+=()'.

  OTC_String a = "abcd";
a += "abcd"; // yields "abcdabcd"
Insertion into the middle of a string is indicated by setting the range to the appropriate location and with a length of `0'. This functionality is provided by the `insert()' member function.

  OTC_String a = "abcd";
a.replace(2,0,"cd"); // yields "abcdcd"
a.insert(4,"ab"); // yields "abcdabcd"
Assigning a new value to the string is indicated by setting the range to encompass the whole string. This functionality is provided by the `assign()' member function.

  OTC_String a = "abcd";
a.replace(0,a.length(),"dcba"); // yields "dcba"
a.assign("abcd"); // yields "abcd"
To improve performance, specific versions of `assign()' are provided for those cases where the string can use the delayed copy mechanism on the value to which the string is being set. For example, where the argument to `assign()' is OTC_String.

The `assign()' member function is also available in the form of the overloaded operator `operator=()'.

  OTC_String a = "abcd";
a = "dcba"; // yields "dcba"
Removal of a section of a string is indicated by specifying the appropriate range and replacing that part of the string with a zero length string. This functionality is provided by the `remove()' member function.

  OTC_String a = "abcd";
a.replace(1,1,""); // yields "acd";
a.remove(1,1); // yields "ad"
A related member function to `remove()' is `truncate()'. The `truncate()' member function is provided in two forms. The first form takes as argument an index into the string. This will result in the character at that position and any characters following it being removed from the string. The second form of the `truncate()' function takes no arguments and sets the string to being empty.

  OTC_String a = "abcd";
a.truncate(1); // yields "a"
a.truncate(); // yeilds ""
For all editing operations described, if an invalid range or index is provided, an exception will be raised. Therefore, check that ranges are valid before performing an operation.

  OTC_String a = "....";
if (i+l <= a.length())
a.replace(i,l),"....");
When specifying a replacement string for the above functions, either an instance of OTC_String, a character or C string may be used. Where appropriate, a character count or string length may also be specified.

16 Addition of Strings

The addition of strings using the overloaded operator `operator+()' is complicated due to the presence of the automatic conversion operator to the type `char const*'. Because `operator+()' must return a new string object, if it were to return an instance of OTC_String it would be possible to assign the result to a variable of type `char const*'. It is subsequently possible that the temporary string object returned by `operator+()' could be destroyed before the value held by the variable is used. For example:

  OTC_String a = "abcd";
OTC_String b = "efgh";
char const* p;

if (p == 0)
{
p = a + b;
// temporary string object is destroyed
// at the end of this scope
}
// p is invalid here
To avoid this problem, `operator+()' does not return an object of type OTC_String. Instead, an object of type OTC_TString is returned. The OTC_TString class does not provide a conversion operator for the type `char const*', thereby avoiding the problem. You will know that you have done something which would not have been safe, as you will get a compile time error indicating you cannot assign the result of `operator+()' directly to a variable of type `char const*'. Although an object of type OTC_TString is returned, the result can still be used where ever you could have passed an instance of OTC_String. This is facilitated by a constructor of OTC_String which accepts an instance of OTC_TString. For example:

  void print(OTC_String const&) {}

OTC_String a = "abcd";
OTC_String b = a + "efgh";
print(a + "efgh");
In circumstances where a conversion existed between an instance of OTC_String and another type, and you need to pass the result of `operator+()' where that other type was expected, you will need to perform an explicit cast. This is necessary, as the compiler will not apply two automatic casts in succession.

  class Item
{
public:
Item(OTC_String const&);
};

void print(Item const&) {}

OTC_String a = "abcd";
print(a + "efgh"); // compiler error
print(OTC_String(a + "efgh")); // okay

17 String Modifiers

In addition to the editing functions, a number of modifier functions are provided to perform specified editing tasks on an instance of OTC_String. The first of these provides a means of changing the case of alphabetic characters within a string. These functions are `upper()' and `lower()'. The member functions are overloaded so that it is possible to specify whether the whole string, some range within the string, or a specified number of characters from the start of the string, should be affected. For example:

  OTC_String a = "abcd";
a.upper(); // yields "ABCD"
a.lower(1,2); // yields "AbcD"
a.upper(2); // yields "ABcD"
Note that these functions modify the string for which they are being applied to. If you want the result to be a new string, either create a separate string first, or use the `clone()' member function to get a copy and modify it.

  OTC_String a = "abcd";

OTC_String b = a;
b.upper();

OTC_String c = a.clone().upper();
To remove leading or trailing white space, or a sequence of a specific character from a string, the set of `trim()' member functions can be used. The `trim()' member function will remove both trailing and leading white space characters. The `ltrim()' member function removes characters from the front of the string. The `rtrim()' member function removes characters from the end of the string. If no argument is supplied to `ltrim()' or `rtrim()', white space characters will be removed.

  OTC_String a = "  aabcdd  ";
a.trim(); // yields "aabcdd";
a.ltrim('a'); // yields "bcdd";
a.rtrim('d'); // yields "bc";
To remove a set number of characters from either end of the string, the `lchop()' and `rchop()' member functions can be used. Both member functions take as argument the number of characters to remove. The `lchop()' member function will remove the characters from the front of the string. The `rchop()' member function will remove the characters from the end of the string.

  OTC_String a = "abcd";
a.rchop(3); // yields "a"
To pad out a string to a certain width, the `ljustify()' and `rjustify()' member functions can be used. Both functions take as arguments the desired width and an optional fill character. If the fill character is not specified, the space character is used. If the string is already greater in length than the width specified no change is made to the string.

With the `ljustify()' member function, padding if required, will be added at the end of the string. That is, the string will be left justified within the field width specified. The `rjustify()' member function will right justify the string within the field width specified. That is, padding will be added at the start of the string if it is required.

  OTC_String a = "abcd";
a.ljustify(6,'.'); // yields "abcd.."
a.rjustify(8,'.'); // yields "..abcd.."
The final modifier member function is `reverse()'. As the name suggest this reverses the contents of the string.

  OTC_String a = "abcd";
a.reverse(); // yields "dcba"

18 Comparison of Strings

All the standard relational operators can be applied to instances of OTC_String. When the relational operators are used, the case of letters is significant. For operators where ordering is being calculated, a lexicographic ranking is used.

Where a comparison is being made between an instance of OTC_String and a pointer of type `char const*', if the pointer is a null pointer, a false value will always be returned. That is, a null pointer does not map to any valid string when comparisons are being performed. It is not even regarded as being the same as an empty string. For example:

  OTC_String a = "";

if (a == (char const*)0)
... // will never execute

if (a == "")
... // will execute
In addition to the relational operators, the member function `compare()' is provided. This member function exists as the equality and inequality operators only allow you to compare complete strings. The `compare()' member function allows you to perform comparisons between sections of strings. This includes being able to specify an index into the string at which the comparison should start and how may characters should be compared. The `compare()' member function will also accept an optional argument to indicate whether comparisons should be case sensitive. The optional arguments should be either of OTCLIB_EXACTMATCH or OTCLIB_IGNORECASE, the latter being for a case insensitive comparison.

  OTC_String a = "abcd";

if (a.compare(1,"bc"))
...
if (a.compare("ABcd",OTCLIB_IGNORECASE))
...

OTC_String b = "abcd";

if (a.compare(1,b,2))
...
If a single value is necessary for representing the relative ordering of strings, the `rank()' member function should be used instead of the `compare()' member function.

19 Comparison of Symbols

To compare instances of OTC_Symbol, equality and inequality operators are provided. When comparing two instances of OTC_Symbol the test consists of a pointer comparison and is therefore more efficient than comparing two strings. When comparing an instance of OTC_String or a C string against an instance of OTC_Symbol, a full comparison based on the contents of the string is performed. This is done in order to avoid having any strings to which the instance of OTC_Symbol is being compared, from being entered into the symbol database.

Instances of OTC_Symbol can not be used with those relational operators where ordering is being assessed. To obtain an arbitrary ordering, you should use the `rank()' member function. To obtain a lexicographic ordering you should convert the symbols to strings and then perform the comparison.