There are two principal classes provided for use in place of C strings. These classes are OTC_String and OTC_Symbol. Both classes provide you with the ability to store character strings containing embedded null characters. Both will also place a null guard byte at the end of the string, in case there is no explicit null terminator. The addition of the null terminator allows the string to be used where C strings would have been used.
The OTC_String class provides you with a range of editing and query operations, and utilises a number of techniques, such as delayed copying, to improve performance and reduce memory usage. The OTC_Symbol class is tailored to applications where a fairly constant number of strings are used as identifiers for objects. To cut down on the number of strings, the OTC_Symbol class maintains an internal database of all symbols which have been created. This ensures there will only ever be one copy of a symbol, allowing for reuse and a potential reduction in the amount of memory used. Keeping only one copy of the string associated with the symbol also allows efficient equality tests for symbols.
char* p = "abc";When C strings are used, a null pointer is often used to mean an undefined string or as an indication that there is no more data available. This meaning for a null pointer can be quite different to a pointer to a valid, but empty string. When using an instance of OTC_String no pointers are involved, thus it is not possible to have this distinction. So as to still allow for the concept of an undefined string, when a string object is created but not initialised, as well as being an empty string, it is also identified as being undefined. Whether a string is undefined can be determined using the `isUndefined()' member function. To determine if a string is empty, the `isEmpty()' member function is used.
OTC_String a; // a is ""
OTC_String b = "abc"; // b is "abc"
OTC_String c = p; // c is "abc"
OTC_String d(p); // d is "abc"
OTC_String e(p,2); // e is "ab"
OTC_String f(p,4); // f is "abc\0"
OTC_String g = d; // g = "abc"
OTC_String h(g,2); // h = "ab"
OTC_String i = 'a'; // i = "a"
OTC_String j('a'); // j = "a"
OTC_String k('a',2); // k = "aa"
OTC_Symbol l = d; // l = "abc"
OTC_String m = l; // m = "abc"
OTC_String a;If you require an empty string, which also needs to be identified as being an undefined string, but you cannot create one within the context you require it, you can obtain one using the static member function `OTC_String::undefinedString()'. The value returned from this function can also be assigned to a string to put it back into the undefined state. For example:
res = a.isEmpty(); // yields OTCLIB_TRUE
res = a.isUndefined(); // yields OTCLIB_TRUE
OTC_String b = "";
res = b.isEmpty(); // yields OTCLIB_TRUE
res = b.isUndefined(); // yields OTCLIB_FALSE
OTC_String a = "";Similar functions to `OTC_String::undefinedString()' are provided in both the OTC_String and OTC_Symbol class for creating defined, but empty strings. These are `OTC_String::nullString()' and `OTC_Symbol::nullSymbol()'.
res = a.isEmpty(); // yields OTCLIB_TRUE
res = a.isUndefined(); // yields OTCLIB_FALSE
a = OTC_String::undefinedString();
res = a.isEmpty(); // yields OTCLIB_TRUE
res = a.isUndefined(); // yields OTCLIB_TRUE
OTC_String a(OTC_Length(128));If it is necessary to change the length of the string after it has been created, an overloaded version of the `length()' member function accepting a single size argument should be used. This member function will allow you to either increase or decrease the length of the string. In the case of the length of the string being increased, the new area of the string made available will not be initialised. The `length()' function is different to the `truncate()' function in that the latter function only allows you to decrease the length of the string.
A common idiom for use of this feature, is where immediately upon creation, the string is to be filled with data which is being read from a file or socket. This is illustrated below.
OTC_String a(OTC_Length(128));When the string class allocates memory for holding a character string, it uses a buffering mechanism such that more space than what is actually required is allocated. This approach is used so as to limit the number of reallocations of memory which have to be made when the string is extended in increments. If you know in advance what the longest length is that the string will grow to in a series of operations, you can force the string to preallocate this capacity so as to avoid any reallocations as the string grows in length.
res = read(fd,a.buffer(),a.length());
if (res != -1)
a.length(res);
else
a.length(0);
To set the capacity of the string at the time of creation, another special constructor is used. The syntax necessary to invoke this constructor is:
OTC_String a(OTC_Capacity(1024));To determine the capacity of the string at any time, the `capacity()' member function can be used. To change the capacity of string after it has been created, an overloaded version of the `capacity()' member function accepting a single size argument is used. Note that when requesting the current capacity of the string, it will not always indicate a size the same as you specified. This is because the size you specify is only a recommendation. The implementation may allocate more space than what you have specified. In addition, if the capacity of the string was already greater than what you had indicated that you needed, the string is not reallocated, but the memory already retained is used.
A variation on the above constructors allows you to specify both the length of the string and also the capacity of the string at the time of creation. As before, when specifying the length at creation, the string will be unitialised. It is assumed that the capacity you specify will be greater than the length. If you specify a capacity less than the length there will be an immediate reallocation. The syntax for specifying both the length and capacity when creating a string is:
OTC_String a(OTC_Length(128),OTC_Capacity(1024));If you are creating a string which you know will never be modified, you can force the capacity of the string to be only that necessary to hold the character string. If the string is long lived, this will reduce the amount of memory which will be unavailable for use elsewhere in the program. Forcing the capacity to only that which is necessary, is performed using the syntax:
static OTC_String const a(OTC_CString("abcd"));The constructor of the OTC_CString class accepts either a pointer to a character string or a single character, and as appropriate, an optional string length or character count.
res = a.length(); // yields 4
res = a.capacity(); // yields 4
To access the underlying character string for an instance of either class, the `string()' member function is used. This member function can be used on either const or non-const instances of each class. The return type of the function is `char const*'. If the string or symbol represents an empty string, a pointer to an empty, null terminated character string is returned.
In addition to the `string()' function, the OTC_String class provides an automatic conversion operator for the `char const*' type. This allows an instance of the OTC_String class to be used explicitly in situations where the need for a conversion to the `char const*' type can be deduced by the compiler. For example:
void print(char const*) { ... }One case where you need to be careful is shown above. That is, when passing an instance of OTC_String to a function accepting a variable number of arguments, you must use the `string()' member function. If you do not do this, the address in memory of the string or symbol object will be passed to the function. This will occur as the compiler cannot deduce that it should apply the conversion operator.
void printf(char const*, ...) { .. }
OTC_String a = "abcd";
print(a);
printf("%s",a.string());
For the OTC_String class, as the return type of the `string()' member function is `char const*', it is not possible to directly modify the contents of the underlying character string through the pointer which that function returns. If you need to modify the underlying character string directly, you should instead use the `buffer()' member function. This member function behaves differently to the `string()' member function in a number of important ways.
The most important difference between the `buffer()' member function and the `string()' member function is that the `buffer()' member function will ensure that the underlying character string is only in use by that instance. That is, if the underlying character string is being shared due to the delayed copy mechanism, the sharing of the character string will be broken before the pointer to the character string is returned.
The second difference in the behaviour of the `buffer()' member function, is that when the length of the string is zero, a null pointer is returned. For the `string()' member function, a pointer to an empty, null terminated string, would be returned. For a const instance of OTC_String, the return type of the `buffer()' member function is `char const*' and thus you cannot modify the underlying character string. A null pointer is still returned though, in the case that the string has zero length.
The existence of the two functions derives from when OTC_String did not allow embedded null characters and a separate class, OTC_Buffer, existed to fulfil that role. To allow easy conversion to the new OTC_String class from the older classes, the existing names and behaviour were preserved. Note that it is not possible to modify the underlying character string of an instance of OTC_Symbol.
In the case of OTC_String, you should not retain the pointer returned by either the `string()' or `buffer()' functions for longer than is necessary for the specific operation you require it. This is necessary, as a subsequent alteration of the string through any of its member functions, may necessitate the underlying character string being reallocated. Such an operation would result in the pointer you hold no longer being valid.
In addition to the problem of the character string being disposed of through an increase in the length of the string, there is also the danger of the string object being destroyed while you still have a pointer to the underlying character string. For example:
OTC_String function() { return "abcd"; }Here a function is returning a string object. As there is an assignment to a variable of type `char const*' a temporary is created to hold the string object so that the conversion operator can be applied. The string object is destroyed at the end of the `if' statement, meaning that `p' will subsequently be invalid as the underlying character string would also have been destroyed.
char const* p;
// ...
if (p == 0)
{
p = function();
}
// p is invalid
If you need to ensure that a pointer to the underlying character string will be valid, you should take a copy of the character string and assume responsibility for deleting it. A copy of the underlying character string can be obtained using the `duplicate()' member function.
OTC_String function() { ... }When writing directly to the underlying character string, it is your responsibility to ensure you do not write beyond the valid length of the string. If you were to write beyond the valid length, the null terminator could be overwritten. The null terminator is only updated when it is necessary to reallocate the underlying character string. In other words, no attempt is made to replace the null terminator if you were to destroy it, neither are any checks made to ensure it exists when the underlying character string is accessed in order to pass it some context where a C string is required.
char const* p;
// ...
if (p == 0)
{
p = function().duplicate();
}
delete [] p;
OTC_String a = "abcd";In the case of a non const instance of the OTC_String class, it is possible for the reference to a character within the array to appear on the left hand side of an assignment. This is achieved through `operator[]()' returning a reference to the actual location for the character within the underlying character string. Due to the possibility of assigning into a specific location of the character string by using the array notation on the left hand side of an assignment, any delayed copy the string is participating in will be broken before the reference to the location in the array is returned. For example:
int len = a.length();
for (int i=0; i<len; i++)
cout << a[i];
cout << endl;
OTC_String a = "abcd";For the same reasons as with the `string()' and `buffer()' member functions, you should not keep the reference beyond the immediate operation due to the possibility of the underlying character string being reallocated by a subsequent operation. You should also note that if you are only going to use the array access notation to read characters, you should ensure the string is accessed as a const object. Doing so will avoid additional memory being allocated due to the need to break a delayed copy. This is illustrated below, where the underlying character string for `tmpString' would be reallocated due to the possibility that assignment could be made into the string using the array notation.
OTC_String b = a;
int len = b.length();
for (int i=0; i<(len/2); i++)
{
char c = b[i];
b[i] = b[len-i-1];
b[len-i-1] = c;
}
cout << a << endl; // yields "abcd"
cout << b << endl; // yields "dcba"
void print(OTC_String const& theString)The version of the function below, ensures that the string is accessed as const object, thus avoiding the possibility of additional memory having to be allocated.
{
OTC_String tmpString = theString;
int len = tmpString.length();
for (int i=0; i<len; i++)
cout << tmpString[i];
}
void print(OTC_String const& theString)When accessing individual characters in the array, it is only possible to access characters within the valid length of the string. That is, it is not possible to access the null terminator which is placed immediately after the active length of the string. An attempt to access beyond the valid length of the string will result in an exception being raised.
{
OTC_String const tmpString = theString;
int len = tmpString.length();
for (int i=0; i<len; i++)
cout << tmpString[i];
}
Because of the bounds checking which is performed against the active length of the string, using `operator[]()' for both the OTC_String and OTC_Symbol class is not as efficient as accessing a normal character string. If you need to perform a number of operations by accessing individual characters, and access needs to be efficient, you should get a pointer to the underlying character buffer and access the characters through that pointer. For example:
OTC_String a = "abcd";If you were only going to be reading characters, use the `string()' member function instead of the `buffer()' member function. Again, this is to avoid the possibility that additional memory will need to be allocated.
char* s = a.buffer();
int len = a.length();
for (int i=0; i<(len/2); i++)
{
char c = s[i];
s[i] = s[len-i-1];
s[len-i-1] = c;
}
void print(OTC_String const& theString) { ... }The presence of the `const' qualifier says that the function will not modify the contents of the string. If it was intended that the function should modify the string and the change reflected back in the string object passed as argument, the `const' qualifier would be dropped. Although the `const' qualifier would be dropped in this latter case, a reference would still be necessary.
As the compiler will apply any necessary conversions, or invoke an appropriate constructor for OTC_String to get the right type for the function, it is possible to pass to the function a normal C string, or any single argument for which a conversion to OTC_String exists, or for which OTC_String has a constructor. As an example, the following are all legal arguments for the `print()' function.
OTC_String a = "abcd";
OTC_Symbol b = a;
char const* c = "abcd";
print(a);
print(b);
print(c);
print("abcd");
print(a+b);
Cases where a reference can be used, is where a copy of a member variable of a class, or a global or static string object is being returned. This is okay as the string object to which you are returning a reference, has a longer lifetime than the scope of the function which is returning it. The most common scenario is that of an accessor function.
class ItemWhere a reference cannot be used, is where the string object being referred to would have been deleted by the time it was used. The primary case where this occurs is where the string object to be returned is located on the stack of the function it is being returned from, or exists as a temporary object within that function. These two cases are illustrated below.
{
public:
OTC_String const& string() const
{ return myString; }
private:
OTC_String myString;
};
OTC_String const& function()Both cases can be resolved by changing the return type of the function to that of a string object. For example:
{
OTC_String theString = "abcd";
return theString;
}
OTC_String const& function()
{
char* theString = "abcd";
return theString;
}
OTC_String function()For the case of literal strings being returned, if the literal strings are predefined and do not change, local static string objects could instead be used.
{
OTC_String theString = "abcd";
return theString;
}
OTC_String const& function()A further scenario which often arises is where for all but one case a valid reference to a string object can be returned. For the one case where a valid reference cannot be returned, an empty string must be returned. This may happen where a string, after a search, is being returned from a collection, but when the search fails an empty string must be returned. For this case, the static member function `OTC_String::undefinedString()' can be used to obtain an empty string.
{
static OTC_String const theString(OTC_CString("abcd"));
return theString;
}
OTC_String const& search(int key)If an empty string was a valid string for those being searched, the caller would need to check explicitly for an undefined string by using the `isUndefined()' function. If an empty string was not valid, a simple check for an empty string would be adequate. In the case where an empty string was valid, the routines entering strings into the collection being searched, would need to ensure an undefined string was not added and that it was replaced with a defined but empty string.
{
if (map.contains(key))
return map.item(key);
return OTC_String::undefinedString();
}
void add(int key, OTC_String const& theString)The issues above for using strings as function arguments and return values do not arise for symbols. This is because the copy constructor for OTC_Symbol is trivial and it can always be passed as an object. References can still be used if desired and if done the above issues for return values will again apply. Note though that for OTC_Symbol, there is no equivalent of the static member function `OTC_String::undefinedString()'.
{
if (!map.contains(key))
{
if (theString.isUndefined())
map.add(key,OTC_String::nullString());
else
map.add(key,theString);
}
}
OTC_String a = "abcd";If the underlying character string has embedded null characters, these will be correctly displayed. All the options of ostream for setting field widths and justification to format the output are honoured.
cout << a << endl;
When reading input from an istream, it is only possible to read into an instance of the OTC_String class. If you need to initialise an instance of OTC_Symbol using input from an istream, you will first need to read into an instance of OTC_String or a character string and initialise the instance of OTC_Symbol through either assignment or a constructor.
One means of reading input from an istream into an instance of OTC_String is the extractor `operator>>()'.
OTC_String a;The extractor `operator>>()' will read in characters from an istream up till the point at which whitespace occurs. Provided that skipping of whitespace is not disabled, any leading whitespace will first be skipped. If a field width is set for the istream, that will be the maximum number of characters which are read. The string will expand as necessary to accommodate whatever number of characters are read.
cin >> a;
Functions equivalent to the `get()' and `getline()' member functions of istream are provided in the form of the static member functions `OTC_String::get()' and `OTC_String::getline()'. As with the istream versions, the default delimiter is the end of line character. Unlike the functions of the istream class, it is not necessary to specify an upper limit as to how many characters should be read, the string will automatically grow to accommodate whatever characters are read. The `get()' and `getline()' functions for OTC_String come in two varieties.
The first variant of these functions, returns a new string object. For example:
OTC_String a = "LINE: ";The string object returned should be assigned to an existing string, used to initialise a string at the point of creation, used to initialise a reference, or used in any operation where a string is acceptable.
a += OTC_String::getline(cin);
If you are repetitively appending the new string object to another string, the second variant of these functions may be more appropriate. These functions take as first argument a string object to which the result should be appended. The functions will read directly into the string you supply, avoiding the need to create a separate string.
OTC_String a = "LINE";The value return from these functions, is a reference to the string object passed as the first argument.
OTC_String::getline(a,cin);
As for the equivalent member functions of istream, the difference between `get()' and `getline()' is that `getline()' removes the delimiter from the input stream. In accordance with conventions for writing functions manipulating an istream, the state of the stream will be set by the functions to indicate whether the end of input or bad input was encountered. Using the state flags, code which reads all lines of input would be written as:
while (cin.good())If instead of reading in input up till a set delimiter, you need to read in a set number of characters, the static member function `OTC_String::read()' can be used. This also comes in the two forms described above. When using this function, you will need to check the length of the string after the call to see if the number of characters which you expected to be read in, were actually read in. For example:
{
OTC_String aString = OTC_String::getline(cin);
if (!cin.fail())
cout << aString << endl;
}
OTC_String a;
OTC_String::read(a,cin,128);
if (a.length() != 128)
...
As the underlying character string for equivalent instances of OTC_Symbol will be the same, the hash value is generated from the location in memory of the underlying character string. As the time taken to calculate the hash value for an instance of OTC_Symbol is constant, it is a better choice than OTC_String for use as a key in OTC_Map, OTC_UniqMap, or as an item in an OTC_Set or OTC_Bag. You will however have to remember that for OTC_Symbol an internal database of symbols is kept, thus if the range of symbols were unbounded, the database would keep growing and memory would not be reclaimed.
To allow OTC_String and OTC_Symbol to be used as keys in an OTC_Map, OTC_UniqMap, or as items in an OTC_Set or OTC_Bag, override versions of the OTC_HashActions class are provided. The override versions of OTC_HashActions are equivalent to calling the `hash()' function for the respective classes and you do not need to do anything in order to use OTC_String and OTC_Symbol in the collection classes where OTC_HashActions is used.
To determine the order or rank of two instances of either OTC_String or OTC_Symbol, the `rank()' member function should be used. The order of instances of OTC_String is based on a lexicographic comparison and thus can take time proportional to the length of the string. The order of instances of OTC_Symbol is based on the location in memory of the underlying character string, with time take to perform this being constant. Override versions of OTC_RankActions are provided for each class to facilitate use of these classes in collection classes which need to rank items. The override versions of OTC_RankActions have the same effect as using the `rank()' member functions.
OTC_Symbol a = "abcd";The delayed copy mechanism can be used when going from OTC_String to OTC_Symbol in this case as the OTC_RString class keeps a flag indicating if it represented a symbol. Because of the flag, there was no need to look the string up a second time in the symbol database. Overall, the delayed copy mechanism reduces the amount of copying which needs to occur when strings and symbols are being initialised from or assigned to each other.
OTC_String b = a;
OTC_Symbol c = b;
The underlying raw string can be accessed for an instance of OTC_String through using the `rawString()' member function. The only reason to access the underlying raw string is when using OTC_SObject. At other times it is best to use the high lever member functions of the OTC_String class.
The default buffering mechanism used by OTC_RString is to always allocate memory in multiples of 16 bytes by rounding up the required size. The intention of this is to reduce the number of allocations which need to occur when minor changes, such as single character additions, are being made to strings. As has been described you can get some measure of control over the buffering using the `capacity()' member function of the OTC_String class.
If you wish to turn off the buffering mechanism which is used such that only enough memory is ever allocated for a string, you can set the environment variable OTCLIB_NOSTRINGBUFFERING prior to executing your application. It is advisable to disable the buffering mechanism when using memory diagnostic tools such as Purify and Sentinel, as this will allow a memory overrun to be more easily detected.
In the default buffering mechanism, there will never be more that 16 additional bytes in the character string to allow for expansion. For some applications, this strategy may not be suitable. An alternate buffering mechanism can be enabled by defining the environment variable OTCLIB_MOWBRAYBUFFERING before executing your application. When this mechanism is enabled the amount of memory allocated will be either 16, 64, 256 or 1024 bytes. Beyond 1024 bytes, a set percentage of the required memory is allocated in addition to that required.
The OTC_RString class provides various member functions to implement the above facilities and to allow access to the character string it holds. Except for use as handle in conjunction with OTC_SObject, the OTC_RString class should be regarded as an implementation class. That is, you shouldn't rely on the naming or functionality of the member functions it provides.
Once an entry for a particular symbol has been created in the symbol database, the entry will remain in the symbol database until the application terminates. When ObjectStore is being used the symbol database is kept in transient memory. This means that you cannot store instances of OTC_Symbol as persistent objects and expect them to be valid in a subsequent invocation of the application with that database. If using ObjectStore, you would need to convert the symbol back to a string and store the string in the database.
Because the string which a symbol represents is kept for the life of the programs execution, you should only use the OTC_Symbol class where the number of different symbols is known is advance and is not excessive. If you use OTC_Symbol for arbitrary strings, the range of which was unbounded, the symbol database could keep growing in size and you may eventually exhaust the available memory.
If you need to know if a symbol has ever been used, the static member function `OTC_Symbol::exists()' can be used. The function accepts a pointer to a C string, and an optional length. If no length is supplied, the characters up to but not including a null character are used.
An alternative solution is to have an instance of OTC_String as a member variable of your class. You can now provide a restricted range of member functions for accessing or modifying the string. In order to be able to pass an instance of your class as argument where ever an instance of OTC_String is accepted, you must now supply a conversion operator for OTC_String. For example:
class ItemAn ability which has been lost in this solution, is that you cannot make use of the feature of OTC_String which allows an instance of OTC_String to be used where the type `char const*' is required.
{
public: operator OTC_String const&() const
{ return myString; }
private:
OTC_String myString;
};
OTC_String a = "abcd";This ability cannot be added back into your class as the addition of a second conversion operator for the type `char const*' will introduce ambiguities where an instance of your class is used in places where either an OTC_String or the type `char const*' is acceptable.
Item b = "abcd";
char const* p;
p = a; // okay
p = b; // invalid
To achieve the desired affect, the class OTC_SObject can be used as a base class to your class. The class OTC_SObject is a special abstract base class providing a pure virtual function for getting at the string representation of a type. By virtue of a constructor of OTC_String, deriving your class from OTC_SObject will allow you to pass an instance of your class as argument in nearly all circumstances where OTC_String is acceptable.
If the ability to use your class where the type `char const*' is accepted is necessary, a conversion operator for that type can now be added to your class. For example:
class Item : public OTC_SObjectThe virtual function of OTC_SObject being overridden is `rawString()'. In implementing this, it is necessary to go directly to the OTC_RString type which is used in implementing the OTC_String class.
{
public:
operator char const*() const
{ return myString.string(); }
private:
OTC_RString rawString() const
{ return myString.rawString(); }
OTC_String myString;
}
Conversion to both OTC_String and the type `char const*' are now possible. In cases where the possibility of both conversions exist, the compiler should use the conversion to the base class in preference to the explicit conversion operator. That is, where both conversions are possible, the function accepting an instance of OTC_String should be that which is used.
OTC_String a = "abcd";When a search fails, a value of `-1' is returned.
res = a.index('b'); // yields 1
res = a.index('z'); // yields -1
res = a.index(2,'b'); // yields -1
OTC_String b = "aabbccdd";
res = b.index('c',2); // yields 4
res = b.index("bb"); // yields 2
If you wish to perform searching or pattern matching based on regular expressions, you should use either of the OTC_Globex, OTC_Regex and OTC_Regexp classes in conjunction with OTC_String. A current limitation of these classes though, is that they cannot handle null characters embedded within a string.
OTC_String a = "abcd";An easier approach is to use member functions of the OTC_String class. The first such function is `section()'. This works in a similar manner to the above example.
OTC_String b(a.string()+1,2); // yields "bc"
OTC_String a = "abcd";The arguments to the `section()' member function represent the range of the string you are after. The range can be specified using a starting index and length, or using the OTC_Range class.
OTC_String b = a.section(1,2); // yields "bc"
OTC_Range r(1,2);
OTC_String c = a.section(r); // yields "bc"
The inverse of the `section()' member function is `except()'. This member function returns everything in the string except the characters in the specified range.
OTC_String a = "abcd";Sections of strings can also be obtained by giving a position in the string. Depending on the function used, either the section before or after the position specified is returned. Whether the returned string should or should not include the position specified is also determined by the particular member function which is used.
OTC_String b = a.except(1,2); // yields "ad"
OTC_String a = "abcd";Finally, the region of a string lying between two indices can be obtained using the `between()' member function. That is, both arguments are indexes, they do not describe a range but define the bounds of the region you want.
OTC_String b = a.after(1); // yields "cd"
OTC_String c = a.from(1); // yields "bcd"
OTC_String d = a.before(1); // yields "a"
OTC_String e = a.through(1); // yields "ab"
OTC_String a = "abcd";The characters at the positions given are not included in the string which is returned.
OTC_String b = a.between(1,3); // yields "c"
Note that for all the member functions described above, a new string is returned, the original string is not modified. Also be aware that no delayed copying mechanism is used when substrings are being created.
OTC_String a = "abcd";All the editing operations for OTC_String use the `replace()' member function, each supplying a different value for the range. Appending to the beginning of a string is indicated by setting the range to start at location `0' and have length `0'. This functionality is provided by the `prepend()' member function.
a.replace(1,2,"cb"); // yields "acbd"
OTC_Range r(1,2);
a.replace(r,"bc"); // yields "abcd"
OTC_String a = "abcd";Appending to the end of a string is indicated by setting the range to start at the location given by the length of the string and have length `0'. This functionality is provided by the `append()' member function.
a.replace(0,0,"cd"); // yields "cdabcd"
a.prepend("ab"); // yields "abcdabcd"
OTC_String a = "abcd";The `append()' member function is also available in the form of the overloaded operator `operator+=()'.
a.replace(a.length(),0,"ab"); // yields "abcdab"
a.append("cd"); // yields "abcdabcd"
OTC_String a = "abcd";Insertion into the middle of a string is indicated by setting the range to the appropriate location and with a length of `0'. This functionality is provided by the `insert()' member function.
a += "abcd"; // yields "abcdabcd"
OTC_String a = "abcd";Assigning a new value to the string is indicated by setting the range to encompass the whole string. This functionality is provided by the `assign()' member function.
a.replace(2,0,"cd"); // yields "abcdcd"
a.insert(4,"ab"); // yields "abcdabcd"
OTC_String a = "abcd";To improve performance, specific versions of `assign()' are provided for those cases where the string can use the delayed copy mechanism on the value to which the string is being set. For example, where the argument to `assign()' is OTC_String.
a.replace(0,a.length(),"dcba"); // yields "dcba"
a.assign("abcd"); // yields "abcd"
The `assign()' member function is also available in the form of the overloaded operator `operator=()'.
OTC_String a = "abcd";Removal of a section of a string is indicated by specifying the appropriate range and replacing that part of the string with a zero length string. This functionality is provided by the `remove()' member function.
a = "dcba"; // yields "dcba"
OTC_String a = "abcd";A related member function to `remove()' is `truncate()'. The `truncate()' member function is provided in two forms. The first form takes as argument an index into the string. This will result in the character at that position and any characters following it being removed from the string. The second form of the `truncate()' function takes no arguments and sets the string to being empty.
a.replace(1,1,""); // yields "acd";
a.remove(1,1); // yields "ad"
OTC_String a = "abcd";For all editing operations described, if an invalid range or index is provided, an exception will be raised. Therefore, check that ranges are valid before performing an operation.
a.truncate(1); // yields "a"
a.truncate(); // yeilds ""
OTC_String a = "....";When specifying a replacement string for the above functions, either an instance of OTC_String, a character or C string may be used. Where appropriate, a character count or string length may also be specified.
if (i+l <= a.length())
a.replace(i,l),"....");
OTC_String a = "abcd";To avoid this problem, `operator+()' does not return an object of type OTC_String. Instead, an object of type OTC_TString is returned. The OTC_TString class does not provide a conversion operator for the type `char const*', thereby avoiding the problem. You will know that you have done something which would not have been safe, as you will get a compile time error indicating you cannot assign the result of `operator+()' directly to a variable of type `char const*'. Although an object of type OTC_TString is returned, the result can still be used where ever you could have passed an instance of OTC_String. This is facilitated by a constructor of OTC_String which accepts an instance of OTC_TString. For example:
OTC_String b = "efgh";
char const* p;
if (p == 0)
{
p = a + b;
// temporary string object is destroyed
// at the end of this scope
}
// p is invalid here
void print(OTC_String const&) {}In circumstances where a conversion existed between an instance of OTC_String and another type, and you need to pass the result of `operator+()' where that other type was expected, you will need to perform an explicit cast. This is necessary, as the compiler will not apply two automatic casts in succession.
OTC_String a = "abcd";
OTC_String b = a + "efgh";
print(a + "efgh");
class Item
{
public:
Item(OTC_String const&);
};
void print(Item const&) {}
OTC_String a = "abcd";
print(a + "efgh"); // compiler error
print(OTC_String(a + "efgh")); // okay
OTC_String a = "abcd";Note that these functions modify the string for which they are being applied to. If you want the result to be a new string, either create a separate string first, or use the `clone()' member function to get a copy and modify it.
a.upper(); // yields "ABCD"
a.lower(1,2); // yields "AbcD"
a.upper(2); // yields "ABcD"
OTC_String a = "abcd";To remove leading or trailing white space, or a sequence of a specific character from a string, the set of `trim()' member functions can be used. The `trim()' member function will remove both trailing and leading white space characters. The `ltrim()' member function removes characters from the front of the string. The `rtrim()' member function removes characters from the end of the string. If no argument is supplied to `ltrim()' or `rtrim()', white space characters will be removed.
OTC_String b = a;
b.upper();
OTC_String c = a.clone().upper();
OTC_String a = " aabcdd ";To remove a set number of characters from either end of the string, the `lchop()' and `rchop()' member functions can be used. Both member functions take as argument the number of characters to remove. The `lchop()' member function will remove the characters from the front of the string. The `rchop()' member function will remove the characters from the end of the string.
a.trim(); // yields "aabcdd";
a.ltrim('a'); // yields "bcdd";
a.rtrim('d'); // yields "bc";
OTC_String a = "abcd";To pad out a string to a certain width, the `ljustify()' and `rjustify()' member functions can be used. Both functions take as arguments the desired width and an optional fill character. If the fill character is not specified, the space character is used. If the string is already greater in length than the width specified no change is made to the string.
a.rchop(3); // yields "a"
With the `ljustify()' member function, padding if required, will be added at the end of the string. That is, the string will be left justified within the field width specified. The `rjustify()' member function will right justify the string within the field width specified. That is, padding will be added at the start of the string if it is required.
OTC_String a = "abcd";The final modifier member function is `reverse()'. As the name suggest this reverses the contents of the string.
a.ljustify(6,'.'); // yields "abcd.."
a.rjustify(8,'.'); // yields "..abcd.."
OTC_String a = "abcd";
a.reverse(); // yields "dcba"
Where a comparison is being made between an instance of OTC_String and a pointer of type `char const*', if the pointer is a null pointer, a false value will always be returned. That is, a null pointer does not map to any valid string when comparisons are being performed. It is not even regarded as being the same as an empty string. For example:
OTC_String a = "";In addition to the relational operators, the member function `compare()' is provided. This member function exists as the equality and inequality operators only allow you to compare complete strings. The `compare()' member function allows you to perform comparisons between sections of strings. This includes being able to specify an index into the string at which the comparison should start and how may characters should be compared. The `compare()' member function will also accept an optional argument to indicate whether comparisons should be case sensitive. The optional arguments should be either of OTCLIB_EXACTMATCH or OTCLIB_IGNORECASE, the latter being for a case insensitive comparison.
if (a == (char const*)0)
... // will never execute
if (a == "")
... // will execute
OTC_String a = "abcd";If a single value is necessary for representing the relative ordering of strings, the `rank()' member function should be used instead of the `compare()' member function.
if (a.compare(1,"bc"))
...
if (a.compare("ABcd",OTCLIB_IGNORECASE))
...
OTC_String b = "abcd";
if (a.compare(1,b,2))
...
Instances of OTC_Symbol can not be used with those relational operators where ordering is being assessed. To obtain an arbitrary ordering, you should use the `rank()' member function. To obtain a lexicographic ordering you should convert the symbols to strings and then perform the comparison.