Survey of the State of the Art in Human Language Technology
That is the title of this
on-line reference work from 1996. This is a large work, 500+ pages, in
13 chapters, plus a glossary and indexes.
It has more of an emphasis on spoken language than some, but
it is still quite comprehensive.
Chapter 3: Language Analysis and Understanding
is essentially a mini-book
on computational linguistics, with contributions by a number of experts,
followed by an extensive bibliography. (PDF chapters cached)
Computational Linguistics (NLP)
Allen, J.,
Natural Language Understanding,
Addison Wesley, 1995. ISBN 0-8053-0334-0.
This has been the standard book on computational linguistics for quite some time,
for use at the advanced undergraduate or graduate level. It is fairly well self-contained
and can be read with no prior background in linguistics.
Jurafsky, D., and J.H. Martin,
Speech and Natural Language Processing,
Prentice-Hall, Upper Saddle River, NJ, 2000. ISBN 0-13-095069-6.
This is an ambitious new book that covers a wide variety of topics.
It may offer serious competition to Allen's book, but it's too early to tell.
Dale, R., H. Moisl, and H. Somers,
Handbook of Natural Language Processing,
Marcel Dekker, Inc., New York, 2000. 968 Pages, ISBN: 0-8247-9000-6.
An up-to-date compendium of chapters on a variety
of topics by many specialists. Hardcover.
Quite expensive, listing for $ 195.00, but worth it for a library or serious collection.
Ruslan Mitkov
The Oxford Handbook of Computational Linguistics
Oxford University Press, 2003, 804 pages, ISBN: 0-19-823882-7.
150 USD (Hardback) Some additional sample pages and the
indexes can be found on Amazon.
Christiane Fellbaum (ed.)
WordNet An Electronic Lexical Database
Here is the
Princeton site for the WordNet project.
"WordNet, an electronic lexical database, is considered to be the most
important resource available to researchers in computational linguistics, text analysis, and many related areas. Its design is
inspired by current psycholinguistic and computational theories of
human lexical memory. English nouns, verbs, adjectives, and adverbs
are organized into synonym sets, each representing one underlying
lexicalized concept. Different relations link the synonym sets."
You can also purchase the
WordNet 1.6 CD-ROM which has all the data on it
along with search tools.
Pattern Matching (Perl, regular expressions)
Learning Perl, 2nd Edition
By Randal L. Schwartz & Tom Christiansen
2nd Edition July 1997
1-56592-284-0.
This is certainly the most popular book on Perl.
Mastering Regular Expressions
, 2nd Edition by Jeffrey E. F. Friedl
2nd Edition July 2002 ISBN: 0-596-00289-0, 484 pages. This focuses on regular expressions
for pattern matching, and now has more extensive discussions of Java, since
regular expressions are built into Java starting with Java 1.4.
Here's a list from Amazon of
ten good Perl books by a devotee and practitioner.
Information Extraction and Retrieval
To quote from the book's description, "Munging can mean manipulating raw data to achieve a final form.
It can mean parsing or filtering data, or the many steps required for
data recognition. " So this is a book on these techniques:
Data Munging with Perl
by David Cross. January 2001, Softbound, 304 pages. ISBN 1930110006
Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribiero-Neto
Copyright 1999, 464 pp. ISBN 0-201-39829-X. There is both a
publisher's site and an
authors' site.
This is one of the few (the only?) up-to-date and good books on the field of information retrieval.
Natural Language Engineering This is recent journal that focuses somewhat more on
the practical aspects of natural language analysis, though it is not really distinct in its coverage from the journal
Computational Linguistics.
Reference works
Added 11/30/2002: Rodney D. Huddleston, Geoffrey K. Pullum
The Cambridge Grammar of the English Language 1860 pp.,
Cambridge University Press; ISBN: 0521431468; (June 2002)
This is a brand-new HUGE reference work. Lists for $150 (US).
At Amazon
It has a useful "Further Reading" section at the end.
Otherwise, it's hard to explain an 1860 page book briefly!
I own a copy and have found it useful.
Here's a
review of the book. It is a critical and useful review, with
many comments well-taken.
Quirk, R., S. Greenbaum, G. Leech, and J. Svartvik,
A Comprehensive Grammar of the English Language, 1779 pp.,
Longman, Inc. (Addison-Wesley Longman these days).
New York, London, Boston, 1985. ISBN: 0582517346.
Quite expensive at $260, but I wouldn't be without it. I do own
a copy.
There's little about the language that is not covered here.
It does not touch on computational linguistics at all.
It is not theoretical. It simply catalogues an enormous number
of phenomena and words and word uses and constructs in English.
And it does it in an orderly way.
The index is about 100 pages long, which adds immeasurably to
the utility of the book.
Here's the
Amazon.com page for the book.
McCawley, James D.,
The Syntactic Phenomena of English,
The University of Chicago Press, Chicago, IL, 2nd ed. 1998.
810 pgs. Cloth $125.00tx 0-226-55627-1 Paper $50.00tx 0-226-55629-8.
This is a nice book that covers a lot of the structures of the language.
Though written by a renowned linguist it is not couched in the jargon that
many linguistics texts are, so it is approachable and even useful, if you're
a bit brave.