Sav Associative Processor library
Version 1.3
1 Introduction
Sav Associative Processor library is a software package for direct data access by means of text sentences. Associative Processor provides developers with a low-level Java API (Java language Application Programming Interface) for designing electronic dictionary, knowledge base, or data indexer in more complete text/database/voice processors. The Sav Processor makes it possible to associate quickly from one set of objects to another.
2 Operations of the Set Theory
It is easy to describe the Sav Associative Processor by terms of the set theory. Processor's objects make up two classes: 1.Association class represents a set of elements, 2.Concept class matches an element of the set. There are three base functions, referred to as methods defining the Association class: 1. set() forms union of concept associations, 2. get() creates intersection of associations, 3. clear() differences associations. Parameter of these methods may be an association, a concept, or a string concept name. The following Java program demonstrates using Processor's methods.
3 Extraction
There is possibility of extraction one association from another via see() method. The see() method declares first symbols for concepts' names of a target association. The method has one parameter of the Concept or String type. To extract every concept from an association, we can use a getFirst() and a getNext() in a loop.
Consider a complete program example of grouping words according to first letters. Notice that all the Processor's classes are imported from the Sav package. Every concept has a string name. We can get string with a getName() method.
There are also useful methods of Association class: getLast() - gets the last concept and has()
- detects quickly whether association has concepts.
A PN class (PN is abbreviation of the Processsor's Notation) holds a useful collection of constants being
final static variables. If need extract concepts with a first character of the determined code range, we can use
as parameter of the see() next Processor's constants (setting concept notations): PN.NATURAL for an English small
letter in 96 - 127 code range, PN.CAPITAL for letter in 64 - 95 code range, PN.ARITHMETICAL for sign in 32 - 63
code range. The PN.NUMERICAL constant denotes integer, in 0 - 1073741823 numerical range. See this fragment.
a.set("9").set("10").set("#10").set("#9");//a = {"10", "9", "#9", "#10"} a.see(PN.NUMERICAL); //a = {"#9", "#10"} c = a.getFirst(); //c = "#9", because 9 numeric < 10 numeric a.see(PN.ARITHMETICAL); //a = {"10", "9"} c = a.getFirst(); //c = "10", because code of the '1' < code of the '9'
The possibility exists of extracting concepts with foreign letters, such as Russian letters in unicode range from 1072 to 1103 through PN.RUSSIAN_N and in 1040-1071 range by PN.RUSSIAN_C.
4 Connection
Ability to connect concepts is incredibly important for systems managing grammar checking, translation, database working, and speech recognition. The Sav Processor works with connections of four base types defined by the constants: PN.IDENTITY, PN.RELATION, PN.IDENTITY_DEF, PN.RELATION_DEF. The PN.IDENTITY_DEF connection is opposite in direction to PN.IDENTITY, the PN.RELATION_DEF is opposite in direction to PN.RELATION. These constants are used as parameter of a con() method. A concept or a string parameter of the con() implies a CONCEPTION type connection. Note that only the CONCEPTION's sign forms the conception name.
Consider a complete example demonstrating using con() in combination with methods: clone() – clones object, fix() - fixes connection pass, regain() – regains fixed connection pass, and store() – stores association in a file. The next program indexes file if it contains "JDK", "VisualJ++" or "Jbuilder" instances of a "Java tool" class.
5 Data Base Indexing Sample
Although the Sav Processor is destined rather for working with knowledge bases, it can be powerful core in data base systems. The capability tests showed, the Processor is highly competitive with quality relation data base systems, for example the Microsoft Access, in access time, indexing speed and required index area. The high speed is attained under compiling source Java code with the Symantec compiler.The next example demonstrates table data indexing and subsequent access to a set of indexes. Tab characters separate each input table field. Newline and/or carriage-return characters divide rows. You can make the Reference.txt input file with a text editor for example so.