MANUAL
Please, don't delete, rename or modify the files located within the
CR folder.
Run the CR.exe within the CR directory.
After each start of the program the Default project is created,
accompanied with a query Untitled.
Each thematic project uses a separate textual database, to which an
unlimited number of queries can be addressed. Under each query name you
save query lines, processing mode, language of data processing and results.
The Project's Manager button lets to easily change between
existing projects and corresponding queries, so as to create the new ones,
rename or delete queries.
Choose a directory containing data files to be processed.
By default it is a $DATBASE in the directory of a current project.
You can copy the files you need into this folder or browse the directories
to select any other folder on your hard disk.
The program will process files in .html and .txt formats.
Note: When you need working with a large database of MS Word files,
please use a corresponding script included within the CR folder. This macros
will convert Word documents into the .txt format.
For this, copy the CONVERT.DOT file into the directory of MS Word templates.
Open a new Word file and choose a template Convert. This will run a macros
automatically.
Don't forget to indicate the path to your files correctly, such as,
for example: C:\My Documents\*.txt.
The program will perform a certain database preprocessing, before the
process of analysing texts and compiling of the output material is initiated.
This procedure is aimed at subsequent using of a unified standard for
clear distinction of paragraphs (information units), independently of the
original text formatting.
This doesn't mean any modification of files in your original database
folder.
Such preprocessing will require a considerable amount of time. After the new files are added to the database directory, they are preprocessed selectively. Meanwhile, the user may require to preprocess the whole database repeatedly. For this, choose the option Clear Preprocessed Data under Edit in the pop-up menu. After subsequent clicking the Start button the program will rebuild a preprocessed database.
The forthcoming version of the program will let user to edit the database
of preprocessed files by indicating text fragments (paragraphs ), which
should be eliminated in course of preprocessing of original files. For
example, it can be reasonable to delete the low informative fragments when
they repeatedly present in different analyzed texts.
Before pressing the Start button, please be sure to make proper
settings for the main parameters.
These include Language, Current Stop Words List, Using
Context, Setting the Processing Mode.
See the appropriate commentaries below.
Initial Query should contain a set of words describing your theme
of interest.
Don't consider them as having to be necessarily present in text fragments
of the output material.
In course of database processing CrossReaderTM will
find words describing the theme in more details and add them to your initial
query.
Type in the query words in a column.
When certain words designate something what you mean only when being
used together (e.g. "US" and "economy"), put them into one line separated
by spaces.
The required number of query lines depends on the processing mode (usually
not less than 3).
When a combination of words forms a set expression, use "_" to separate
them.
If you consider words as synonyms or being equally appropriate for
describing your theme, use "&" to separate them in a line.
Each query line can be preceded by indication of its presence being
obligatory:
"#" - obligatory in every paragraph selected;
"%" - obligatory for the starting group of paragraphs (usually 3-7
text fragments, depending on the processing mode);
"+" - obligatory present in the first paragraph selected.
After starting the program, this window will display new queries, automatically
created by the system in addition to your initial query.
One can put up to 5 words into a line. There can be up to 30 lines.
Important: the system considers the order of lines as corresponding
to their relative value in user's query.
This is a nonobligatory used list of words (terms) occurring at high frequencies within a given thematic field. You can use it in order to augment relevancy and informativeness of the output material. For example, if query theme is related to information technologies, then context list should probably include such terms as "computers", "software", "www", "Internet", etc.
Pressing the Edit Context button will open a text window in which you can write in the words belonging to context. Type in the terms in a column.
By choosing a Use Context option you are able to increase the relevancy of the output material. The system will consider presence of the context words as an additional criteria for including a text fragment into an output material. This is particularly helpful when a word has several different meanings.
With the help of Include in Query option you indicate, whether
the context terms should be included into newly created queries. Presence
of common terms in queries generally results in increasing amount of the
output material, accompanied by decrease of its informativeness. Hence,
in most cases choosing this option is not of advantage.
This is a list of domain-specific concepts (thesaurus), which can be
used for indexing of the processed texts, eliminating the necessity for
the stop words list. This option is not available in current version.
List of commonly used, auxiliary and syntactic words of a given language, which the system should not consider as meaningful.
Edit Basic opens an editable list of stop words, which are ignored when working with every project.
Edit Current displays an editable list of stop words, which are
ignored when working with the current project, as considered being not
informative for a given theme.
Tuning of the system depending on the amount and quality of information..
CrossReaderTM performs an information
processing according to a set of rules, defining certain features of the
output material: relevancy to theme, degree of a shift to the new aspects
of a theme, and degree of logic integrity. The user indicates degree of
strictness of requirements for such features on the Search Mode
scale.
Shifting to the right on the Search Mode scale will make the system follow the more strict requirements for including and arranging of the information items in the output material. This increases the degree of relevancy but results in less amount of the output information. This mode also requires more processing time. It is mainly recommended when one supposes lots of relevant information to be present in the database.
Shifting to the left will weaken requirements for data processing. This
is worth choosing when the database contains little information relevant
to your query.
Choosing more left options in the scale will also result in accumulating
of information items which are indirectly related to the initial theme.
Such mode can be of special use for analytical goals.
This is a set of data processing parameters, optimized for individual needs; can be selected from available choices. For example, one can prefer a data processing mode resulting in such kinds of the output material as a "review" or "report", "an informational context of a given event", "revealing of hidden connections between subjects mentioned in the query".
A "Standard" profile available in current settings of the system is optimized for creating a review-type material from the political, economical or business information.
Upon selecting an "Additional" profile, the user is presented with a Processing Mode Scale optimized for working with a "monothematic" database, containing lots of information on the subject described in a query.
Please mention, that being a registered user you can apply for
a modified profile which will be optimally correspond to your tasks.
Automatic decrease of requirements for data processing. Use it when
expecting that data processing will take too long time. In this mode the
system will test the database content: if there is insufficient information
relevant to the query, the program will gradually decrease the requirements
for selecting and arranging of the information items.
Indicate the suitable data processing time (hours & minutes). CrossReaderTM
exploits a built-in strategy of using the reserved time. In particular,
formation of initial part of the output material requires relatively more
time than that for subsequent parts. (Note: this is a time only for working
with your query, and it doesn't count the time needed for certain data
preprocessing in case when database is analyzed for the first time).
You may change the Language of data processing. Selection automatically
activates corresponding linguistic facilities, including the Stop Words
List and parsing. Default is set to English.
15. Start, Pause, Continue, Stop
The Start button lets user to initiate the data processing. Be sure to press this button after all other options are properly selected.
There is an option to Pause the data processing. This lets to
edit the current query, created by the system and displayed in the Current
Query window. In order to resume processing press Continue.
Pressing the Stop button lets user to finish the data processing.
16. Automatic Renewal of the Query
CrossReaderTM will permanently renew
your initial query, producing query lines which can more precisely describe
the theme, as it is present in the database materials. Such renewed queries
will sequentially replace each other in the Current Query window.
17. Information Displayed in Course of Processing
The Results Preview frame will display messages concerning formation
of every new information Cluster (part of the output material).
At the same time, the Initial Query window will change to Current
Query and display the new, automatically created queries.
Upon selection of any cluster within the Result's Preview Contents
frame, you can view the Key Words corresponding to this Cluster
of text fragments. These are words which helped to select material for
the given cluster.
Simultaneously, in the Suggested Words frame there will be shown
additionally revealed words, characteristic for this part of content. These
latter words will also be included into the newly formed query.
The Processing Messaging fields in the beneath of the program window will display information about the course of working process, including error messages.
Here are listed in particular:
- the names of screened files;
- error messages;
- message "Step", followed by the numbers indicating accordingly:
the number of the query being processed, total amount of processing time
(the database preprocessing period not counted), remaining processing time
(regarding the time limit preset by the user);
- message "Read", followed by the numbers indicating accordingly:total
amount of analyzed files and the number of screened paragraphs.
- message "Found", followed by the numbers indicating accordingly:
total amount of selected paragraphs (included in the output material),
number of words, suggested for including into new query.
Pressing the Show Full Results button will launch a file in HTML format, displaying a List of Clusters (thematic parts) of the output material. Each Cluster is accompanied with two groups of characteristic words. These are Key Words which were used to select material for the given cluster, and the additionally revealed, or Suggested words. The amount of paragraphs within each Cluster is indicated also.
From each point in the List of Clusters there is a hypertext link to the Cluster of text fragments itself. Clicking the View Full Document button, which precedes each text paragraph, will take you to a corresponding place of the whole document. This unformatted text contains also a link to the initial (original) document.
The Result files are kept within a directory of a given project in a
folder named the same way as was the name of the query.
Note, that when you repeat the processing procedure while using the
same name for the Query, the results will be overwritten.