Text Processing in Python

Peter Kitson

ISBN : 0321112547

Order a printed copy of this book from Amazon.


Cover Design - Text Processing in Python
 

For your free electronic copy of this book please verify the numbers below. 

(We need to do this to make sure you're a person and not a malicious script)

Numbers

 




Sample Chapter From Text Processing in Python
     Copyright © David Mertz



CHAPTER I -- PYTHON BASICS


This chapter discusses Python capabilities that are likely to
be used in text processing applications. For an introduction
to Python syntax and semantics per se, readers might want to
skip ahead to Appendix A (A Selective and Impressionistic
Short Review of Python); Guido van Rossum\'s _Python Tutorial_
at <http://python.org/doc/current/tut/tut.html> is also quite
excellent. The focus here occupies a somewhat higher level:
not the Python language narrowly, but also not yet specific to
text processing.

In Section 1.1, I look at some programming techniques that flow
out of the Python language itself, but that are usually not
obvious to Python beginners--and are sometimes not obvious even
to intermediate Python programmers. The programming techniques
that are discussed are ones that tend to be applicable to text
processing contexts--other programming tasks are likely to have
their own tricks and idioms that are not explicitly documented in
this book.

In Section 1.2, I document modules in the Python standard library
that you will probably use in your text processing application,
or at the very least want to keep in the back of your mind. A
number of other Python standard library modules are far enough
afield of text processing that you are unlikely to use them in
this type of application. Such remaining modules are documented
very briefly with one- or two- line descriptions. More details on
each module can be found with Python\'s standard documentation.


SECTION 1 -- Techniques and Patterns


TOPIC -- Utilizing Higher-Order Functions in Text Processing

 
This first topic merits a warning. It jumps feet-first into
higher-order functions (HOFs) at a fairly sophisticated level
and may be unfamiliar even to experienced Python programmers. Do
not be too frightened by this first topic--you can understand the
rest of the book without it. If the functional programming (FP)
concepts in this topic seem unfamiliar to you, I recommend you
jump ahead to Appendix A, especially its final section on FP
concepts.

In text processing, one frequently acts upon a series of chunks
of text that are, in a sense, homogeneous. Most often, these
chunks are lines, delimited by newline characters--but
sometimes other sorts of fields and blocks are relevant.
Moreover, Python has standard functions and syntax for reading
in lines from a file (sensitive to platform differences).
Obviously, these chunks are not entirely homogeneous--they can
contain varying data. But at the level we worry about during
processing, each chunk contains a natural parcel of instruction
or information.