ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L) NAME archie - an Internet archive server listing service SYNOPSIS archie DESCRIPTION The archie system is a program which can query a database maintained by the Computer Science Department of McGill University. The database contains a list of software which is available by means of anonymous ftp(1) to hosts connected to the Internet network. The system can be accessed in an interactive fashion or via electronic mail (email). In order use the interactive sys- tem: 1) Connect to host quiche.cs.mcgill.ca (132.206.2.3 or 132.206.51.1) with telnet(1). 2) Login as user archie (no capitals, no password required). The system prints a banner message and status report. 3) Type ``help'' for further information. In order to use the email interface, send requests to archie@cs.mcgill.ca Send the word ``help'' in a message for available commands and features. Please note that this is an automated inter- face: no human sees it. See "THE EMAIL INTERFACE" section below. Comments and suggestions should be sent to archie-l@cs.mcgill.ca Adimistrative requests such as adding a site to the database or modifying the Software Description Database should be sent to archie-admin@cs.mcgill.ca THE INTERACTIVE INTERFACE Variables archie has a number of variables which modify its behavior. The values of these variables may be changed using the set command. archie distinguishes between three types of vari- able: boolean which may be either set or unset. numeric representing an integer within a pre-determined range. string whose value is a string of characters (which may or may not be restricted). The following variables are currently recognized autologout By default, archie will exit after one hour of idle time. This value can be changed though this variable, which represents in minutes, the length of idle time before you are automatically logged out. The minimum and maximum values are 1 and 300, representing one minute through five hours. Example: set autologout 45 will cause you to be automatically logged out after 45 minutes of idle time. mailto A string variable whose value is a mail address, or comma-separated list of addresses. Note that there must not be any spaces within the list of addresses. If this is set and the mail command is issued with no argu- ments, then the output of the last command is mailed to that address. Example: set mailto user@frobozz.com Example: set mailto user1@hello.edu,user2@goodbye.com All the various Internet addressing styles are under- stood. BITNET sites should use the convention user@sitename.bitnet UUCP addresses can be specified as user@sitename.uucp maxhits A numeric variable whose value is the maximum number of matches you want the prog command to generate. If archie seems to be slow, or you don't want a lot of output this can be set to a small value. ``maxhits'' must be within the range 0 to 1000. The default value is 1000. Example: set maxhits 100 prog will now stop after 100 matches have been found pager A boolean variable which, when set, tells archie to filter all output through the pager less(1L). When using the pager you may also want to set the term vari- able to your terminal type (see term variable). Example: set pager search This variable determines the kind of search performed on the database by the prog command, providing flexi- bilty on search times and types. search is a string variable whose value is one of the following: sub Substring (case insensitive). A simple, everyday substring search. A match occurs if the the file (or directory) name in the database contains the user-given substring. Example: The pattern ``is'' will match ``islington'' and ``this'' and ``poison'' subcase Substring (case sensitive). As above but the case of the strings involved becomes significant. Example: ``TeX'' will match ``LaTeX'' but not ``Latex'' or ``TExTroff''. exact Exact match. The fastest search method of all. The restriction is that the user string (the argu- ment to the prog command) has to exactly match (including case) the string in the database. This is provided for those of who who know just what you are looking for. For example, if you wanted to know where all the ``xlock.tar.Z'' files were, this is the kind of search to use. regex This is the default search method. Searches the database with the user (search) string which is given in the form of an ed(1) regular expression. NOTE: Unless specifically anchored to the begin- ning (with ^) or end (with $) of a line, ed(1) regular expressions (effectively) have ``.*'' prepended and appended to them. For example, it is not necessary to say prog .*xnlock.* since prog xnlock will suffice. Thus the regex match becomes a sim- ple substring match. sortby This variable describes how the output from the prog command is to be ordered. It can have one of 5 values (and their associated reverse orders). For each method, the ``natural'' sort order (or at least, what we con- sider to be the natural order) is the default. hostname Output is sorted on the archive hostname in lexi- cal order. Reverse order rhostname time Output is sorted with the most recent modifcation times of the found file/directory names coming first (youngest -> oldest). Reverse order rtime size Output is sorted by the size of the found files/directories, largest first. Reverse order rsize filename Sorted in file/directory name lexical order. Reverse order rfilename none This is the DEFAULT order. Unsorted. There is no reverse order although rnone is accepted for symmetry. Typing the keyboard interrupt character ( Ctl-C for most people on UNIX) during a search will cause the search to aborted. The results up to that time will be sorted (determined by the value of the sortby variable) and the results output. The output phase may itself be aborted by typing the abort character a second time. status This boolean variable determines if the status-line will be displayed while the prog command is searching through the database. If set (which is the default value) then the number of matches and percentage of the database searched is displayed. Otherwise no output is given until the search is complete. term This variable tells archie what type of terminal you are using, and optionally its size in rows and columns. This information is used by the pager. The usage is: set term [<#rows> [<#columns>]] That is, the terminal type is required, but the number of rows and columns is optional. You may specify a value for rows only, but if you want to change the number of columns you must give a value for both rows and columns. The default values for rows and columns are 24 and 80. Examples: set term vt100 set term xterm 60 set term xterm 24 100 Regular Expressions archie uses ed(1) regular expressions in a number of commands. A regular expression, on the one hand, is a string like any other; a sequence of characters. On the other hand, special characters within the string have certain functions which make regular expressions useful when trying to match portions of other strings. In the fol- lowing discussion and examples, a string containing a regular expression will be called the ``pattern'', and the string against which it is to be matched is called the ``reference string''. Regular expressions allow one to search for ``all strings ending with the letters ize '' or ``all strings beginning with a number between 1 and 3 and ending in a comma''. In order to accomplish this, regular expressions co-opt the use of some characters to have special meaning. They also provide for these characters to lose their special meaning if the user so desires. The rules for regular expresssion are c Any character c matches itself unless it has been assigned other special meaning as listed below. Most special characters can be escaped (made to lose its special meaning), by placing the character '\' in front of it. This doesn't apply to '{' which is non-special until it is escaped. Thus although '*' normally has special meaning the string '\*' matches itself. Example: The pattern acdef matches s83acdeffff or acdefsecs or acdefsecs but not accdef or aacde1f That is it will any string that contains ``acdef'' any- where in the reference string. Example: Normally the characters '*' and '$' are special, but the pattern a\*bse\$ acts as above. That is any reference string containing ``*abse$'' as a substring will be flagged as a match. . A period matches any character except the newline character. This is known as the wildcard character. Example: The pattern .... will match any 4 characters in the reference string, except a newline character. ^ If `^' appears at the begining of the pattern then it is said to ``anchor'' the match to the beginning of the line. That is, the reference string must start with the pattern following the `^'. If this character appears anywhere else other than at the beginning of the line, then it is no longer considered special, and matches itself as any non-special character would. Similarly if it starts a string but is escaped, it matches itself. Example: The pattern ^efghi Will match efghi or efghijlk but not abcefghi That is the pattern will match only those reference strings starting with ``efghi''. Just containing the substring is not sufficient. $ Occurring at the end of the pattern, this character ``anchors'' the pattern to the end of the line (refer- ence string). A '$' occurring anywhere else in the pat- tern is regarded as a non-special. Similarly if it is at the end of the pattern but is escaped, it is non- special. Example: The pattern efghi$ Will match efghi or abcdefghi but not efghijkl That is the pattern will match only those reference strings ending with ``efghi''. Just containing the sub- string is not sufficient. \< This sequence in the pattern causes the one character regular expression following it only to match something at the beginning of a word: the beginning of a line or just before a letter, digit or underline character, or just after a charcter which is not one of these. Example: The pattern \ Constrains the one-character regular expression fol- lowing it to be at the end of a ``word'' as defined above. [string] One or more characters within square brackets. This pattern matches any single character within the brack- ets. The caret, '^', has a special meaning if it is the first character in the series: the pattern will match any character other than one in the list. Example: The pattern [^abc] Will match any character except 'a', 'b' or 'c'. To match a right bracket, ']', in the list it must be put first: []ab01] For a caret, '^', in the list it can appear anywhere but first. In [ab^01] the caret loses its special meaning. The '-' character is special within square brackets. It is interpreted as a range of characters (in the ASCII character set) and will match any single character within that range. '[a-z]' matches any lower case letter. The '-' can be made non special by placing it first or last within the square brackets. The characters '$', '*' and '.' are not special within square brackets. Example: The pattern [ab01] matches a single occurence of a character from the set 'a', 'b', '0', '1'. Example: The pattern [^ab01] will match any single character other than 'a', 'b', '0', '1'. Example : The pattern [a0-9b] which matches one of 'a', 'b' or a digit between 0 and 9 inclusive. Example : The pattern [^a0-9b.$] means any single character not 'a', 'b' '.' , '$' or a digit between 0 and 9 inclusive. * An asterisk following a regular expression in the pat- tern has the effect of matching zero or more occurrences of that expression. Example: The pattern a* means zero or more occurrences of the character 'a'. Example: The pattern [A-Z]* means zero or more occurrences of the upper case alpha- bet. \{m\} \{m,\} \{m,n\} A one-character regular expression followed by one of the three of these constructions causes a range of occurrences of that regular expression to be matched. If it is followed by \{m\} where m is a non-negative integer between 0 and 255 (inclusive), then exactly m occurrences of that regular expression are matched. If followed by \{m,\}, then at least m occurrences are matched. Finally, if it is followed by \{m,n\} (where n is a non-negative integer between 0 and 255 and where n > m), then between m and n occurrences of the expres- sion are matched. Example: The pattern ab\{3\} would match any substring in the reference string of an 'a' followed by exactly 3 'b's. Example: The pattern ab\{3,\} would match any substring in the reference string of an 'a' followed by at least 3 'b's. Example: The pattern ab\{3,5\} would match any substring in the reference string of an 'a' followed by at least 3 but at most 5 'b's. Common Problems with Regular Expression (1) When matching a substring it is not necessary to use the wildcard character to match the part of the refer- ence string preceeding and following the substring. Example: The pattern abcd will match any reference string containing this pat- tern. It is not necessary to use .*abcd.* as the pattern. (2) In order to constrain a pattern to the entire reference pattern, use the the construction: ^pattern$ (3) The easiest way to obtain case insensitivity in a regu- lar expression is to use the '[]' operator. For exam- ple, a pattern to match the word ``hello'' regarless of the case of the letters would be: [Hh][Ee][Ll][Ll][Oo] Commands Arguments to commands shown here in square brackets '[]' are optional. All others are mandatory. help List the valid archie commands. list [pattern] This command provides a list of the sites currently stored in the database and the time at which they were last updated. There is an optional regular expression argument to limit the list to specific sites. Note that the numerical (IP) address associated with a site name is valid at the listed time, but since they do occasionally change, it is possible that a discrepancy may occur until that site is updated in our database. Furthermore, the listed IP address is the primary, as listed in the DNS database: secondary addresses are not stored. Example: list will list all sites in the database, while list \.de$ lists all German sites. mail [address1,[address2...]] With an argument (or arguments) the output of the last command is mailed to the specified address or comma- separated list of addresses. No spaces must appear anywhere in the address list. Example: mail user1@hello.edu,user2@goodbye.com Without an argument the output of the last command is sent to the address specified in the mailto variable. Example: mail All the various Internet addressing styles are under- stood. BITNET sites should use the convention user@sitename.bitnet UUCP address can be specified as user@sitename.uucp prog pattern Find all occurrences of programs with names matching pattern. How pattern is interpreted depends on the value of the search variable. The output lists the names of hosts with matching entries, the size of the matching program, its last modification date and its path. The results are sorted according to the value of ths sortby variable, and are limited in number by the max- hits variable. set variable-name This command allows you to set one of archie's vari- ables. Their values affect how archie interacts with the user. boolean variables are either set or unset Example: set pager numeric variables take a number within a certain range Example: set maxhits 500 string variables take a (possibly restricted) string value Example: set sortby time See entries on unset and show . show [variable-name] This command is used to display the value of a partic- ular variable, or all variables. With an argument it will display the value of that variable, without an argument it will display the value of all variables. Example: show maxhits site sitename This command allows you to get a full listing of an ftp(1) site in the archie database. The output format is similar to that of UNIX ls(1) long recursive (-lR) listing. Example: site col.hp.com unset variable This causes the specified variable to have no value. This means that it will not be used by archie until it has been given a value with the set command. Note: this may cause ``counter-intuitive'' behaviour in some cases (e.g. in the case of maxhits ). Although one might expect prog to print matches without regard for any limit, this is not the case. If the value of maxhits is not available it will merely fall back to some internal default. whatis substring This command searches the archie Software Description Database for the given substring, with case being ignored. This database consists of names and short descriptions of many of the software packages, docu- ments (like RFCs and educational material) and data files that are stored on the Internet. Example: whatis uucp in part gives as a result: findpath.sh UUCP Pathfinder logfile-stats UUCP LOGFILE analyzer mapstats UUCP map statistics program We welcome and encourage additions and corrections to this database and depend on the archie user community to keep it uptodate. To make your contribution to this database, mail to archie-admin@cs.mcgill.ca For new additions, please keep the description to 25 words or less. THE EMAIL INTERFACE The archie email interface currently accepts a limited sub- set of the interactive interface commands, plus a few of its own. Currently variables are not supported in the email interface. Requests to this server should be addressed to archie@cs.mcgill.ca Note that the ``Subject:'' line in incoming mail is pro- cessed as if it were part of the main message body. No spe- cial keywords are required. Note that the help command is exclusive. All other commands in the same message are ignored. The server recognizes the following commands. If a message not containing any valid requests or an empty message is received, it will be considered to be a help request. path path This lets the requestor override the address that would normally be extracted from the header. If you do not hear from the archive server within a couple of hours might consider adding a path command to your request. The path describes how to mail a message from cs.mcgill.ca to your address. cs.mcgill.ca is fully connected to the Internet. BITNET users can use the convention user@site.bitnet UUCP user can use the convention user@site.uucp help Will send you a message describing how to use the email interface (basically this section). prog [ ...] A search of the archie database is performed with each (a regular expression as defined by ed(1)) in turn, and any matches found are returned to the reques- tor. Note that multiple may be placed on one line, in which case the results will be mailed back to you in one message. If you have multiple prog lines, then multiple messages will be returned, one for each line [This doesn't work as expected at the moment... stay tuned]. Any regular expression containing spaces must be quoted with single (') or double (") quotes. ALL OTHER ed(1) rules must be followed. NOTE: The searches are CASE SENSITIVE. The ability to change this will hopefully be added soon. The prog command is currently executed as if the search variable were set to regex. site | A listing of the given will be returned. The fully qualified domain name or IP address may be used. compress ALL of your files in the current mail message will run through compress(1) and uuencode(1). When you receive the reply, remove everything before the ``begin'' line and run it through uudecode(1). This will produce a .Z file. You can then run uncompress(1) on this file and get the results of your request. quit Nothing past this point is interpreted. This is pro- vided so that the occasional lost soul whose signature contains a line that looks like a command can still use the server without getting a bogus response. THE ARCHIE DATABASE The archie database subsystem maintains a list of about 600 Internet ftp(1) archive sites. Each night, the database subsystem executes an anonymous ftp(1) to a subset of these sites and fetches a recursive directory listing (or a file containing the recursive directory listing if this exists). Currently, each site gets updated approximately once a month. The directory listings are stored on quiche.cs.mcgill.ca (132.206.2.3), where they are available to the Internet community via anonymous ftp(1). They appear in the directory ~ftp/archie/listings in compressed form. BUGS 1) Only UNIX sites are included in the database. 2) The user can not limit searches to specific sites. 3) There is no graphical user interface. 4) There is no way to abort the help facility completely. It is hoped that all these will change in coming versions. LONG TERM PLANS The archie system is regarded as being ``in development'' and is not being released to outside sites at present. The current database requires about 70 MB of disk storage, and the updates and searches put a noticeable load on the Sun 4/280 on which it operating. Eventually, we hope to distri- bute archie to several sites around the world. We welcome comments and suggestions; please send them to archie-l@cs.mcgill.ca. SEE ALSO ftp(1), telnet(1) AUTHORS Alan Emtage (bajan@cs.mcgill.ca), McGill University. Bill Heelan (wheelan@cs.mcgill.ca), McGill University. Manual page by R. P. C. Rodgers, UCSF School of Pharmacy, San Francisco, California 94143 (rodgers@maxwell.mmwb.ucsf.edu) and Alan Emtage. Downloaded From P-80 International Information Systems 304-744-2253