Practical mod_perl

Peter Kitson

ISBN : 0596002270

Order a printed copy of this book from Amazon.

Cover Design - Practical mod_perl

For your free electronic copy of this book please verify the numbers below. 

(We need to do this to make sure you're a person and not a malicious script)



Sample Chapter From Practical mod_perl
     Copyright © Stas Bekman, Eric Cholet

Chapter 1. Introducing CGI and mod_perl

This chapter provides the foundations on which the rest of the book builds. In this chapter, we give you:

  • A history of CGI and the HTTP protocol.

  • An explanation of the Apache 1.3 Unix model, which is crucial to understanding how mod_perl 1.0 works.

  • An overall picture of mod_perl 1.0 and its development.

  • An overview of the difference between the Apache C API, the Apache Perl API (i.e., the mod_perl API), and CGI compatibility. We will also introduce the Apache::Registry and Apache::PerlRun modules.

  • An introduction to the mod_perl API and handlers.

1.1. A Brief History of CGI

When the World Wide Web was born, there was only one web server and one web client. The httpd web server was developed by the Centre d\'Etudes et de Recherche Nucléaires (CERN) in Geneva, Switzerland. httpd has since become the generic name of the binary executable of many web servers. When CERN stopped funding the development of httpd, it was taken over by the Software Development Group of the National Center for Supercomputing Applications (NCSA). The NCSA also produced Mosaic, the first web browser, whose developers later went on to write the Netscape client.

Mosaic could fetch and view static documents[2] and images served by the httpd server. This provided a far better means of disseminating information to large numbers of people than sending each person an email. However, the glut of online resources soon made search engines necessary, which meant that users needed to be able to submit data (such as a search string) and servers needed to process that data and return appropriate content.

[2]A static document is one that exists in a constant state, such as a text file that doesn\'t change.

Search engines were first implemented by extending the web server, modifying its source code directly. Rewriting the source was not very practical, however, so the NCSA developed the Common Gateway Interface (CGI) specification. CGI became a standard for interfacing external applications with web servers and other information servers and generating dynamic information.

A CGI program can be written in virtually any language that can read from STDIN and write to STDOUT, regardless of whether it is interpreted (e.g., the Unix shell), compiled (e.g., C or C++), or a combination of both (e.g., Perl). The first CGI programs were written in C and needed to be compiled into binary executables. For this reason, the directory from which the compiled CGI programs were executed was named cgi-bin, and the source files directory was named cgi-src. Nowadays most servers come with a preconfigured directory for CGI programs called, as you have probably guessed, cgi-bin.

1.1.1. The HTTP Protocol

Interaction between the browser and the server is governed by the HyperText Transfer Protocol (HTTP), now an official Internet standard maintained by the World Wide Web Consortium (W3C). HTTP uses a simple request/response model: the client establishes a TCP[3] connection to the server and sends a request, the server sends a response, and the connection is closed. Requests and responses take the form of messages. A message is a simple sequence of text lines.

[3]TCP/IP is a low-level Internet protocol for transmitting bits of data, regardless of its use.

HTTP messages have two parts. First come the headers, which hold descriptive information about the request or response. The various types of headers and their possible content are fully specified by the HTTP protocol. Headers are followed by a blank line, then by the message body. The body is the actual content of the message, such as an HTML page or a GIF image. The HTTP protocol does not define the content of the body; rather, specific headers are used to describe the content type and its encoding. This enables new content types to be incorporated into the Web without any fanfare.

HTTP is a stateless protocol. This means that requests are not related to each other. This makes life simple for CGI programs: they need worry about only the current request.

1.1.2. The Common Gateway Interface Specification

If you are new to the CGI world, there\'s no need to worry—basic CGI programming is very easy. Ninety percent of CGI-specific code is concerned with reading data submitted by a user through an HTML form, processing it, and returning some response, usually as an HTML document.

In this section, we will show you how easy basic CGI programming is, rather than trying to teach you the entire CGI specification. There are many books and online tutorials that cover CGI in great detail (see Our aim is to demonstrate that if you know Perl, you can start writing CGI scripts almost immediately. You need to learn only two things: how to accept data and how to generate output.

The HTTP protocol makes clients and servers understand each other by transferring all the information between them using headers, where each header is a key-value pair. When you submit a form, the CGI program looks for the headers that contain the input information, processes the received data (e.g., queries a database for the keywords supplied through the form), and—when it is ready to return a response to the client—sends a special header that tells the client what kind of information it should expect, followed by the information itself. The server can send additional headers, but these are optional. Figure 1-1 depicts a typical request-response cycle.