Using the CrossReaderTM

MANUAL



1. General
2. Getting Started
3. Addressing the Database
4. Database Preprocessing
5. Main Settings
6. Initial QUERY
7. Using Context
8. Thematic Vocabulary
9. Stop Words List
10. Processing Mode
11. Profile
12. AUTO
13. Time
14. Language
15. Start, Pause, Continue, Stop
16. Automatic Renewal of the Query
17. Information Displayed in Course of Processing
18. Viewing the Full Results


1. General

Please, don't delete, rename or modify the files located within the CR folder.
 

2. Getting Started

Run the CR.exe within the CR directory.
After each start of the program the Default project is created, accompanied with a query Untitled.
Each thematic project uses a separate textual database, to which an unlimited number of queries can be addressed. Under each query name you save query lines, processing mode, language of data processing and results.

The Project's Manager button lets to easily change between existing projects and corresponding queries, so as to create the new ones, rename or delete queries.
 

3. Addressing the Database

Choose a directory containing data files to be processed.
By default it is a $DATBASE in the directory of a current project. You can copy the files you need into this folder or browse the directories to select any other folder on your hard disk.
 

4. Database Preprocessing

The program will process files in .html and .txt formats.
Note: When you need working with a large database of MS Word files, please use a corresponding script included within the CR folder. This macros will convert Word documents into the .txt format.
For this, copy the CONVERT.DOT file into the directory of MS Word templates. Open a new Word file and choose a template Convert. This will run a macros automatically.
Don't forget to indicate the path to your files correctly, such as, for example: C:\My Documents\*.txt.

The program will perform a certain database preprocessing, before the process of analysing texts and compiling of the output material is initiated.
This procedure is aimed at subsequent using of a unified standard for clear distinction of paragraphs (information units), independently of the original text formatting.
This doesn't mean any modification of files in your original database folder.

Such preprocessing will require a considerable amount of time. After the new files are added to the database directory, they are preprocessed selectively. Meanwhile, the user may require to preprocess the whole database repeatedly. For this, choose the option Clear Preprocessed Data under Edit in the pop-up menu. After subsequent clicking the Start button the program will rebuild a preprocessed database.

The forthcoming version of the program will let user to edit the database of preprocessed files by indicating text fragments (paragraphs ), which should be eliminated in course of preprocessing of original files. For example, it can be reasonable to delete the low informative fragments when they repeatedly present in different analyzed texts.
 

5. Main Settings

Before pressing the Start button, please be sure to make proper settings for the main parameters.
These include Language, Current Stop Words List, Using Context, Setting the Processing Mode.
See the appropriate commentaries below.
 

6. Initial QUERY

Initial Query should contain a set of words describing your theme of interest.
Don't consider them as having to be necessarily present in text fragments of the output material.
In course of database processing CrossReaderTM will find words describing the theme in more details and add them to your initial query.

Type in the query words in a column.
When certain words designate something what you mean only when being used together (e.g. "US" and "economy"), put them into one line separated by spaces.
The required number of query lines depends on the processing mode (usually not less than 3).
When a combination of words forms a set expression, use "_" to separate them.
If you consider words as synonyms or being equally appropriate for describing your theme, use "&" to separate them in a line.
Each query line can be preceded by indication of its presence being obligatory:
"#" - obligatory in every paragraph selected;
"%" - obligatory for the starting group of paragraphs (usually 3-7 text fragments, depending on the processing mode);
"+" - obligatory present in the first paragraph selected.
After starting the program, this window will display new queries, automatically created by the system in addition to your initial query.
One can put up to 5 words into a line. There can be up to 30 lines.
Important: the system considers the order of lines as corresponding to their relative value in user's query.
 

7. Using Context

This is a nonobligatory used list of  words (terms) occurring at high frequencies within a given thematic field. You can use it in order to augment relevancy and informativeness of the output material. For example, if query theme is related to information technologies, then context list should probably include such terms as "computers", "software", "www", "Internet", etc.

Pressing the Edit Context button will open a text window in which you can write in the words belonging to context. Type in the terms in a column.

By choosing a Use Context option you are able to increase the relevancy of the output material. The system will consider presence of the context words as an additional criteria for including a text fragment into an output material. This is particularly helpful when a word has several different meanings.

With the help of Include in Query option you indicate, whether the context terms should be included into newly created queries. Presence of common terms in queries generally results in increasing amount of the output material, accompanied by decrease of its informativeness. Hence, in most cases choosing this option is not of advantage.
 

8. Thematic Vocabulary

This is a list of domain-specific concepts (thesaurus), which can be used for indexing of the processed texts, eliminating the necessity for the stop words list. This option is not available in current version.
 

9. Stop Words List

List of commonly used, auxiliary and syntactic words of a given language, which the system should not consider as meaningful.

Edit Basic opens an editable list of stop words, which are ignored when working with every project.

Edit Current displays an editable list of stop words, which are ignored when working with the current project, as considered being not informative for a given theme.
 

10. Processing Mode

Tuning of the system depending on the amount and quality of information..
CrossReaderTM performs an information processing according to a set of rules, defining certain features of the output material: relevancy to theme, degree of a shift to the new aspects of a theme, and degree of logic integrity. The user indicates degree of strictness of requirements for such features on the Search Mode scale.

Shifting to the right on the Search Mode scale will make the system follow the more strict requirements for including and arranging of the information items in the output material. This increases the degree of relevancy but results in less amount of the output information. This mode also requires more processing time. It is mainly recommended when one supposes lots of relevant information to be present in the database.

Shifting to the left will weaken requirements for data processing. This is worth choosing when the database contains little information relevant to your query.
Choosing more left options in the scale will also result in accumulating of information items which are indirectly related to the initial theme. Such mode can be of special use for analytical goals.
 

11. Profile

This is a set of data processing parameters, optimized for individual needs; can be selected from  available choices. For example, one can prefer a data processing mode resulting in such kinds of the output material as a "review" or "report", "an informational context of a given event", "revealing of hidden connections between subjects mentioned in the query".

A "Standard" profile available in current settings of the system is optimized for creating a review-type material from the political, economical or business information.

Upon selecting an "Additional" profile, the user is presented with a Processing Mode Scale optimized for working with a "monothematic" database, containing lots of information on the subject described in a query.

Please mention, that being a registered user you can apply for a modified profile which will be optimally correspond to your tasks.
 

12. AUTO

Automatic decrease of requirements for data processing. Use it when expecting that data processing will take too long time. In this mode the system will test the database content: if there is insufficient information relevant to the query, the program will gradually decrease the requirements for selecting and arranging of the information items.
 

13. Time

Indicate the suitable data processing time (hours & minutes). CrossReaderTM exploits a built-in strategy of using the reserved time. In particular, formation of initial part of the output material requires relatively more time than that for subsequent parts. (Note: this is a time only for working with your query, and it doesn't count the time needed for certain data preprocessing in case when database is analyzed for the first time).
 

14. Language

You may change the Language of data processing. Selection automatically activates corresponding linguistic facilities, including the Stop Words List and parsing. Default is set to English.
 

15. Start, Pause, Continue, Stop

The Start button lets user to initiate the data processing. Be sure to press this button after all other options are properly selected.

There is an option to Pause the data processing. This lets to edit the current query, created by the system and displayed in the Current Query window. In order to resume processing press Continue.
Pressing the Stop button lets user to finish the data processing.
 

16. Automatic Renewal of the Query

CrossReaderTM will permanently renew your initial query, producing query lines which can more precisely describe the theme, as it is present in the database materials. Such renewed queries will sequentially replace each other in the Current Query window.
 

17. Information Displayed in Course of Processing

The Results Preview frame will display messages concerning formation of every new information Cluster (part of the output material). At the same time, the Initial Query window will change to Current Query and display the new, automatically created queries.
Upon selection of any cluster within the Result's Preview Contents frame, you can view the Key Words corresponding to this Cluster of text fragments. These are words which helped to select material for the given cluster.
Simultaneously, in the Suggested Words frame there will be shown additionally revealed words, characteristic for this part of content. These latter words will also be included into the newly formed query.

The Processing Messaging fields in the beneath of the program window will display information about the course of working process, including error messages.

Here are listed in particular:
 - the names of screened files;
 - error messages;
 - message "Step", followed by the numbers indicating accordingly: the number of the query being processed, total amount of processing time (the database preprocessing period not counted), remaining processing time (regarding the time limit preset by the user);
 - message "Read", followed by the numbers indicating accordingly:total amount of analyzed files and the number of screened paragraphs.
 - message "Found", followed by the numbers indicating accordingly: total amount of selected paragraphs (included in the output material), number of  words, suggested for including into new query.
 

18. Viewing the Full Results

Pressing the Show Full Results button will launch a file in HTML format, displaying a List of Clusters (thematic parts) of the output material. Each Cluster is accompanied with two groups of characteristic words. These are Key Words which were used to select material for the given cluster, and the additionally revealed, or Suggested words. The amount of paragraphs within each Cluster is indicated also.

From each point in the List of Clusters there is a hypertext link to the Cluster of text fragments itself. Clicking the View Full Document button, which precedes each text paragraph, will take you to a corresponding place of the whole document. This unformatted text contains also a link to the initial (original) document.

The Result files are kept within a directory of a given project in a folder named the same way as was the name of the query.
Note, that when you repeat the processing procedure while using the same name for the Query, the results will be overwritten.


To the Top of  this Page