CHAPTER I -- PYTHON BASICS
This chapter discusses Python capabilities that are likely to be used in text processing applications. For an introduction to Python syntax and semantics per se, readers might want to skip ahead to Appendix A (A Selective and Impressionistic Short Review of Python); Guido van Rossum\'s _Python Tutorial_ at <http://python.org/doc/current/tut/tut.html> is also quite excellent. The focus here occupies a somewhat higher level: not the Python language narrowly, but also not yet specific to text processing.
In Section 1.1, I look at some programming techniques that flow out of the Python language itself, but that are usually not obvious to Python beginners--and are sometimes not obvious even to intermediate Python programmers. The programming techniques that are discussed are ones that tend to be applicable to text processing contexts--other programming tasks are likely to have their own tricks and idioms that are not explicitly documented in this book.
In Section 1.2, I document modules in the Python standard library that you will probably use in your text processing application, or at the very least want to keep in the back of your mind. A number of other Python standard library modules are far enough afield of text processing that you are unlikely to use them in this type of application. Such remaining modules are documented very briefly with one- or two- line descriptions. More details on each module can be found with Python\'s standard documentation.
SECTION 1 -- Techniques and Patterns
TOPIC -- Utilizing Higher-Order Functions in Text Processing
This first topic merits a warning. It jumps feet-first into higher-order functions (HOFs) at a fairly sophisticated level and may be unfamiliar even to experienced Python programmers. Do not be too frightened by this first topic--you can understand the rest of the book without it. If the functional programming (FP) concepts in this topic seem unfamiliar to you, I recommend you jump ahead to Appendix A, especially its final section on FP concepts.
In text processing, one frequently acts upon a series of chunks of text that are, in a sense, homogeneous. Most often, these chunks are lines, delimited by newline characters--but sometimes other sorts of fields and blocks are relevant. Moreover, Python has standard functions and syntax for reading in lines from a file (sensitive to platform differences). Obviously, these chunks are not entirely homogeneous--they can contain varying data. But at the level we worry about during processing, each chunk contains a natural parcel of instruction or information.
|