INTRODUCTION TO DATABASE CONCEPTS You've probably heard the term "database" before. It is a common word in the software business. Although it may sound a bit ominous to the new computer user, a "database" is simply a group of information that is stored in a pre-described format. A database file is, basically, a computerized card file. With Pirouette, we have attempted to exploit this similarity. When you view the data in your database, it will appear within "windows" which resemble index cards. The program itself performs two primary tasks: (1) it allows you to easily update and maintain the information in your database; and (2) it allows you to retrieve that information in a variety of forms (e.g., reports, labels, etc.). All databases have certain things in common, and a particular terminology is used by all database management programs. This basic terminology will be described in the following section. If you have never worked with a database management system before, please read this section carefully. DATABASE TERMINOLOGY The best way to explore database terminology is to look at some sample databases. There are three common databases that most people use every day: a checkbook, an address book, and an appointment calendar. Files for these three databases have been provided with the program, and will be used as examples throughout the manual. For now, we'll use the personal checkbook. Below are a few typical entries: CHECK # DATE PAID TO AMOUNT CLEARED 122 10/22/88 PG&E 34.97 Y 123 10/22/88 Pacific Bell 44.22 Y 124 10/25/88 Safeway 21.19 Y 125 10/26/88 AAA Finance Co. 204.75 N . . . . . . . . . . . . . . . When you stub a check you've written, you keep track of four items of information: the check number, the date the check was written, to whom it was made payable, and the amount. Additionally, when you balance your bank statement each month, you mark off the checks which have cleared -- in essence a fifth item of information. -7- In database terminology, each line (i.e., each transaction) in the checkbook register is referred to as a RECORD. Each record is composed of five FIELDS; that is, five items of information. Each of these four fields has a distinct DATA TYPE associated with it. The check date will always be a Date. The payee will always consist of alphanumeric, or Character, information. The amount will always be Numeric, in that it contains numbers which can be added or subtracted. The "cleared" field is what is known as a Logical field, in that it can contain only two values: Yes and No. The first field, the check number, will be dealt with in a moment. In summary then, every database has certain characteristics: (1) it is organized into records; (2) every record has the same number of fields as every other record; and (3) the type of data stored in each field is the same in every record in the database. TYPES OF DATA Pirouette recognizes four types of data: Character. Character fields are used to record descriptive information, such as names and addresses. The payee field in our checkbook contains character information. Numeric. Numeric data fields contain information which will be manipulated arithmetically (i.e., added, substracted, etc.) Date. Date information; specifically the month, date and year in the format MM/DD/YY. Logical. As mentioned above, Logical fields are one which contain the values True and False, or Yes and No. In an actual checkbook, you generally check off a box to indicate that the check has cleared, rather than using a Y or N, but the underlying concept is exactly the same. Up till now we have avoided dealing with the Check Number field. This is because we must now consider an issue in database design. At first glance, the check number would appear to be a numeric field. Many people, however, use alphabetical codes in this check register column to keep track of things like bank service charges and automatic teller withdrawals. Moreover, while the check number is apparently numeric in nature, these numbers are never used for mathematical computations. In setting up this database, therefore, it would probably be BETTER to designate the check number as a Character field. -8- In general, fields should only be declared as Numeric when they will actually be used for arithmetic computations. RELATED DATABASES A conventional database, or "flat file" has an extremely simple structure. One could characterize a single database as two-dimensional: it has width (the number of fields) and length (the number of records), but nothing else. This two-dimensional structure, however, isn't adequate (or at lease efficient) for describing many real-life situations. One of these situations is illustrated in CHECKREG, the Check Register database we've included. CHECKREG is designed to keep track of your checks and other associated activities, like a computerized check register. It tracks the Check number (or other code), date, payee, etc. This database by itself is fine for a personal checking account, but if you're running a business, things get a bit more complicated (as usual, because of the government). Under certain circumstances, the federal goverment requires businesses to report money paid to other businesses or individuals. The Form 1099 which the government uses has a space for the payee's Federal Tax ID (or Social Security) number, so we must have a field for TAX_ID as well. If we place this TAX_ID field in the CHECKREG database, we have a small problem. If multiple checks are written to the same payee (as would happen in most business situations), we will wind up entering the same ID number over and over again on each check. This not only creates more work for the data entry operator, but our database becomes significantly larger than it needs to. A BETTER solution is to set up a second database to hold the TAX_ID (which is exactly what we've done in our sample). This information can then be "looked up" whenever we need to. The resulting two databases might look something like this: DATABASE 1: Payees PAYEE_NAME TAX_ID Ace Typesetting 667-22-8738 Law Office of D.B. Cooper 547-18-3299 . . . . . . -9- DATABASE 2: Checkreg PAY DATE AMOUNT PAYEE_NAME 04/01/87 100.00 Law Office of D.B. Cooper 07/25/89 335.75 Ace Typesetting 07/27/89 250.00 Law Office of D.B. Cooper . . . . . . . . . . . . This kind of set-up is referred to as a "relational" database structure, because we have two databases which are related to each other. By definition, related databases are tied together with a common field -- in this case, the PAYEE_NAME. Another side of database relationships is the issue of validation. If you were using the above structure, you would probably not want to enter a record into the CHECKREG database for someone who did not appear in the PAYEES database. For this reason, Pirouette provides the ability to restrict input for a field to only those entries found in a corresponding field in another database. We've restricted the PAYEE_NAME field in CHECKREG to allow only entries from the PAYEE_NAME field in PAYEES. (Incidentally, you don't necessarily have to have an associated TAX_ID for every payee; the TAX_ID could be left blank where it wasn't needed. And later on we'll describe simple technique for adding to the payee list as you're entering checks.) This "restrict to list" feature is not just a powerful tool for preventing data entry error. As we will show a little later, it is also a device to greatly speed up data entry. -10-