CGI scripts

© Mike Smith
M.A.Smith@brighton.ac.uk University of Brighton UK.

Contents

* CGI scripts
* Example of calling a CGI script file
* Decoding data sent to a CGI script
* Script to record users of web page
* Post vs. get
* Check list
   


Warning if you are not using a browser that supports tables
such as Netscape 1.1 or later then this page
will probably be very difficult to read.

Index CGI scripts

A CGI script file is written in a programming language which can be either:

Examples, of languages used include:

The CGI script is executed when an anchor tag <A ... > or an image tag <IMG ...> refers to the CGI script file rather than a normal file. The determination of whether this is a CGI script file or just an HTML file is made on the physical placement of the file on the server. Usually this placement is in the web servers cgi-bin directory. However the exact location of this directory on the server machine is determined by the web administrator. This placement and control of the cgi-bin directory is determined by the web administrator to prevent security problems, that could occur if arbitrary programs where allowed to be executed by anybody accessing the machine.

Index Call of a CGI script file

An anchor tag to execute the CGI script dynamic_page on the server www.mc.com is:

<A HREF="http://www.mc.com/cgi-bin/dynamic_page">Dynamic page</A>

When the web server process a request to fetch a file, if the requested file is in the servers nominated cgi-bin directory then as long as this file is marked as being executable the script will be run on the server. If the file is not executable then an error will be reported.

The script eventually returns an HTML page or image to be displayed as the result of its execution. When a CGI script file executes it may access environment variables to discover additional information about the process that it is to perform. The first line of the returned data must be:

Type of returned data Text
An HTML page Content-type: text/html
A gif image Content-type: image/gif

A simple CGI script on a unix based system to return a list of the current users who are logged onto that system is:


#!/bin/sh
echo Content-type: text/html
echo 
echo 
echo "<HTML>"
echo "<HEAD>"
echo "</HEAD>"
echo "<BODY>"
echo "<H2>Users logged on the server are:</H2>"
echo "<PRE>"
who
echo "</PRE>"
echo "</BODY>"
echo "</HTML>"
Remember:
  • The "'s around text with a < or > character.

On a Unix system:

  • The first line is #!/bin/sh
  • The file is set executable.

Note:
The JCL (Job Control Language) command echo echoes the rest of the line to the standard output
The JCL command who lists the current users who are logged onto the system.
Allowing users to create their own CGI scripts can lead to security problems on the server.
The major environment variables that can be accessed by the CGI script when it executes are:

Environment variable Contains
QUERY_STRING Data sent to the CGI script, by its caller. This may be the output from a form, or other dynamically or statically generated data.
REMOTE_ADDR The Internet address of the host machine making the request.

A C++ program mas_env.cpp when run prints many of the environment variables available to a CGI script.

CGI scripts can be written in any language. For example, a CGI script to return the contents of the environment variable QUERY_STRING can be written in Ada 95.

Note:
I used the gcc compiler version 2.7.0 to compile this source code. In particular this compiler recognises the new data type bool.




Index Decoding data sent to a CGI script

When a form is used, the information collected in the form is sent to the CGI script for processing. This information is placed in the environment variable QUERY_STRING.

To pass information explicitly to the environment variable QUERY_STRING a modified form of an anchor tag is used. In this modified anchor tag, the data to be sent to the environment variable QUERY_STRING is appended after the URL which denotes the CGI script. The character ? is used to separate the URL denoting the CGI script and the data that is to be sent to the script. For example:

<A HREF="/cgi-bin/script?name=Your+name&action=find"> Link </A>

The data "name=Your+name&action=find" is placed in the environment variable QUERY_STRING and the cgi script script executed.

A class written in C++ composed of the specification parse.h and implementation parse.cpp is used to extract the individual components in the QUERY_STRING . The header file t99_type.h contains definitions for C++ features not implemented in some compilers. The members of this class are:

Method Responsibility
Parse Set the string that will be parsed.
set Set a different string to be parsed.
get_item Return the string associated with the keyword passed as a parameter. If no data return NULL.
get_item_n Return the string associated with the keyword passed as a parameter. If no data then return the null string.

When using the member functions get_item and get_item_n the optional second parameter specifies which occurrence of the string associated with a keyword to return. This is to allow the recovery of information attached to identical keywords. In addition the returned string will have had the following substitutions made on it.

Note:
The definition of NO_MAP will cause the code for ~username processing to be not included. This is so that the code can be compiled for machines, which do not support the system function map_uname defined in the header file pwd.h.

For example, if the QUERY_STRING contained:

tag=one&name=mike&action=%2B10%25&tag=two&log=~mas/log&tag=three

Then the following program when compiled and run:

enum bool { false, true };

#include <iostream.h>
#include <stdlib.h>

#include "parse.h"
#include "parse.cpp"

void main()
{
  char *query_str = getenv("QUERY_STRING");

  Parse list( query_str );

  cout << "name  = " << list.get_item_n( "name" ) << "\n";
  cout << "action= " << list.get_item_n( "action" ) << "\n";
  cout << "log   = " << list.get_item_n( "log", 1, true ) << "\n";
  for ( int i=1; i<=4; i++ )
  {
    cout << "tag  (" << i << ") = ";
    cout << list.get_item_n( "tag" , i ) << "\n";
  }
}

would produce the following output:

name  = mike
action= +10%
log   = /usr/staff/mas/log
tag  (1) = one
tag  (2) = two
tag  (3) = three
tag  (4) =




Index Script to record users of web page

By using an URL denoting a CGI script in an <IMG> tag additional processing can be performed before the image is delivered. This additional processing records details about the current viewer of the web page. Additional information is sent to the CGI script to specify the exact details of the action to take. For example:

Formatted text HTML markup required

<IMG SRC="/cgi-bin/mas_rec?page=HTML&file=log&img=dot.gif"
     ALT="Record not made">

The CGI script mas_rec written in C++ is sent the following information:

Parameter name Specifies
file The name of the file in which the usage information will be appended.
page A name for the page that will recorded in the log.
img The image that will be loaded.

Of course for this to work, the viewer of the page must be viewing and hence loading images. Several reasons why images may not be loaded include:




Index Post vs. Get

So far the method used to send information to the CGI script has been GET. When the method GET is used the data sent is placed in the environment variable QUERY_STRING for the CGI script to process.

An alternative method is to use POST. When the method POST is used the data is sent by a separate stream and becomes the standard input to the CGI script. The method used is specified on the <FORM ..> tag using the attribute METHOD="get" or METHOD="post". The default method is GET.

For example:

Generated form HTML markup required


 <FORM METHOD="get"
       ACTION="http://host/cgi-bin/mas_form">
 <INPUT TYPE="text" NAME="name"
        SIZE=20 VALUE="Try it (get)">
 </FORM>
 


 <FORM METHOD="post"
       ACTION="http://host/cgi-bin/mas_form">
 <INPUT TYPE="text" NAME="name"
        SIZE=20 VALUE="Try it (post)">
 </FORM>
 

When using the POST attribute, the following environment variables are set:

Environment variable Contains
CONTENT_LENGTH The length of the data sent via the standard input to the CGI program.
CONTENT_TYPE The MIME type of the data.

Try it

A simple script to record in a log file data sent by a user is:


#!/bin/sh
echo Content-type: text/html
echo 
echo 
echo "<HTML>"
echo "<HEAD>"
echo "</HEAD>"
echo "<BODY>"
echo "<H2>Data recorded</H2>"
echo Use the back arrow on the browser
echo to return to the original web page
echo "</BODY>"
echo "</HTML>"
cat  >> /home/snowwhite/staff/mas/log
echo  >> /home/snowwhite/staff/mas/log
Remember:
  • To use a full path name for the location of the file in which the information is recorded.

On a Unix system:

  • The first line is #!/bin/sh
  • The file is set executable and setuid to you

An example of its use is shown below:

Generated form HTML markup required

 <FORM METHOD="post"
       ACTION="http:/machine/cgi-bin/mas_cgi1">
 <INPUT TYPE="text" NAME="name"
        SIZE=20 VALUE="fill in">
 




Index Check list

It is important to make sure that the CGI script is:


Warning if you are not using a browser that supports tables
such as Netscape 1.1 or later then this page
will probably be very difficult to read.


The material in these WWW page(s) is copyright © M.A.Smith August 1995
Last modified 22 February 1996