Web Client Programming with Perl - Automating Tasks on the Web
ISBN : 156592214X
Cover Design - Web Client Programming with Perl - Automating Tasks on the Web
For your free electronic copy of this book please verify the numbers below.
(We need to do this to make sure you're a person and not a malicious script)
Sample Chapter From Web Client Programming with Perl - Automating Tasks on the Web
Copyright © Clinton Wong
Why Write Your Own Clients?
With the proliferation of available web browsers, you might wonder why you would want to write your own client program. The answer is that by writing your own client programs, you can leap beyond the preprogrammed functionality of a browser. For example, the following scenarios are all possible:
And so on. Think about resources that you regularly visit on the Web. Maybe every morning you check the David Letterman top ten list from last night, and before you leave the office you check the weather report. Can you automate those visits? Think about that time you wanted to print an entire document that had been split up into individual files, and had to select Chapter 1, print, return to the contents page, select Chapter 2, etc. Is there a way to print the entire thing in one swoop?
Browsers are for browsing. They are wonderful tools for discovery, for traveling to far-off virtual lands. But once you know what you want, a more specialized client might be more effective for your needs.
The Web and HTTP
If you don\'t know what the Web is, you probably picked up the wrong book. But here\'s some history and background, just to make sure we\'re all coming from the same place.
The World Wide Web was developed in 1990 by Tim Berners-Lee at the Conseil Europeen pour la Recherche Nucleaire (CERN). The inspiration behind it was simply to find a way to share results of experiments in high-energy particle physics. The central technology behind the Web was the ability to link from a document on one server to a document on another, keeping the actual location and access method of the documents invisible to the user. Certainly not the sort of thing that you\'d expect to start a media circus.
So what did start the media circus? In 1993 a graphical interface to the Web, named Mosaic, was developed at the University of Illinois at Urbana-Champaign. At first, Mosaic ran only on UNIX systems running the X Window System, a platform that was popular with academics but unknown to practically anyone else. Yet anyone who saw Mosaic in action knew immediately that this was big news. Soon afterwards, Mac and PC versions came out, and the Web started to become immensely popular. Suddenly the buzzwords started proliferating: Information Superhighway, Internet, the Web, Mosaic, etc. (For a while all these words were used interchangeably, much to the chagrin of anyone who had been using the Internet for years.)
In 1994, a new interface to the Web called Netscape Navigator came on the (free) market, and quickly became the darling of the Net. Meanwhile, everyone and their Big Blue Brother started developing their own web sites, with no one quite sure what the Web was best used for, but convinced that they couldn\'t be left behind.
Most of the confusion has died down now, but not the excitement. The Web seems to have permanently captured the imagination of the world. It brings up visions of vast archives that can now be made globally available from every desktop, images and multimedia that can be distributed to every home, and... money, money, money. But the soul of the Web is pure and unchanged. When you get down to it, it\'s just about sending data from one machine to another--and that\'s what HTTP is for.
Browsers and URLs
The most common interface to the World Wide Web is a browser, such as Mosaic, Netscape Navigator, or Internet Explorer. With a browser, you can download web documents and view them formatted on your screen.
When you request a document with your browser, you supply a web address, known as a Universal Resource Locator or URL. The URL identifies the machine that contains the document you want, and the pathname to that document on the server. The browser contacts the remote machine and requests the document you specified. After receiving the document, it formats it as needed and displays it on your browser.
For example, you might click on a hyperlink corresponding to the URL http://www.oreilly.com/index.html. Your browser contacts the machine called www.oreilly.com and requests the document called index.html. When the document arrives, the browser formats it and displays it on the screen. If the document requires other documents to be retrieved (for example, if it includes a graphic image on the page), the browser downloads them as well. But as far as you\'re concerned, you just clicked on a word and a new page appeared.