Natural language processing of biology text
Founding editor: Bob Futrelle (2001)
Site updated 7/5/2010
News, 7/5/2010: Here is the
updated link
to the collection BioNLP Resources created by Alex Morgan.
Though it is six years old, it can be a useful starting point.
News, 10/31/06:
The Association for Computational Linguistics
has created a Wiki
so I've created a prominent link to it in the column to the left.
News, 7/20/05:
Additional search options have been added to the
search page,
namely: Google Scholar, Citeseer, and BLIMP.
A search on BLIMP for "2005" is impressive,
returning 67 hits, as of today.
News, 6/20/05:
You might be interested in a paper we prepared for use in our own research.
It is an Open Access Biology paper in which we've numbered every
sentence and every token within each sentence. This makes it possible
for people working at a distance to discuss various items and constructions of
interest to parsing, text mining, etc.
Access it here.
News, 6/20/05:
Google Scholar has grown to the point that it is a useful tool
for finding BioNLP-related papers.
A mailing list note about Google Scholar
is here.
News, 4/13/05: The volume of material in
our field continues to increase. So rather than trying to cache a large
number of papers on the BioNLP site, the information on new papers,
conferences, etc., is being distributed primarily through the
BIONLP mailing list and is available in
the publicly readable archives.
Since you can
search the archives using Google,
that makes the information reasonably available.
News, 10/7/04: Two papers devoted to
biology text analysis and mining have just been published in the
IBM Systems Journal. Access them at the top of our
Articles page
News, 9/18/04: Abstracts for six papers on Biomed text mining
from PAKDD 2004 are available. Follow the Articles link on the
left or go directly here.
News, 8/8/04:
I've added Google phrase search of PubMed to the search page.
It adds some capability missing in PubMed itself.
There is also a link there to some notes about it.
Follow the "Search" link at the top left of this page.
The searching question has generated a good bit of email to the list.
See the
August 2004 email list archives,
for example.
(8/9/04:): There is an additional note in the mail archive that describes
how to search the hundreds of millions of words of full text articles (not abstracts)
in PubMed Central using Google phrase search.
this mail item.
News, 7/16/04:
Alexander Morgan has produced a quite useful page of information
about and links to a variety of freely available BioNLP resources.
It is located at:
It is divided into
the following sections:
- Text Processing Tools
- Lexical Resources
- Corpora
- Annotation Tools
News, 3/19/04:
CALL FOR PAPERS: IEEE Transactions on Knowledge and Data Engineering (TKDE)
Special Issue on Mining Biological Data, including:
Literature Extraction, Text Mining, and Ontologies.
Submission due date 15 July 2004.
in this PDF extracted from the latest issue of TKDE. This news item
was also sent to the BioNLP mailing list, as most of these items are.
So joining the mailing list will get such information to you in a
timely way.
News, 2/2/04:
A lengthy new review on biomedical text mining by Shatkay and Feldman
was just published. See the link on the
Articles page.
News, 11/30/03:
New OUP book on computational linguistics:
info, table of contents.
News, 11/17/03:
The much-heralded new Open Access journal, PLOS Biology, has a new issue out,
Vol 1, No 2, which has feature article "Tough Mining --
The challenges of searching the scientific literature" about NLP for biology.
It's a news feature rather than a technical article, but it's interesting.
Access it here.
News, 11/6/03:
I have created a search facility using standard Google hacks, that
allows you to search the site or the mail archives or
the huge ACL Anthology (at
Use the Search link at the top left or
right here.
(You'll notice that a number of the
search results
start with "Return to BIONLP.ORG home page" -- that's something I need to fix.)
News, 8/19/03:
The sixteen papers from the 2003 ACL Workshop on
Natural Language Processing in Biomedicine are available online.
You can retrieve them at:
The papers
are in pdf and ps format and include a Bibtex entry.
A quick list of paper titles and authors is available in
BioNLP mailing list archive posting here.
News, 7/20/03:
A 2002 review by Mandell and Majoros on Genomics and NLP (10 pages, PDF)
Here is a copy cached on the site.
See also the note about it on the articles page.
News, 7/6/03:
BioMed Central research article corpus available for data mining.
BioMed Central has published more than 2400 peer reviewed research articles,
all of which are covered by BioMed Central's open access license policy:
Unlike a traditional journal's license agreement,
BioMed Central's license allows completely free reuse and
redistribution of the content by anyone.
Note that these are full-text articles, not abstracts.
Further details are available here.
News, 6/19/03:
The deadline for the SIGIR'03 Workshop on Text Analysis for
Bioinformatics has been extended to June 27, 2003. We seek short
papers on preliminary and recent work. Authors will retain copyright
ownership and are free to submit their papers for publication
elsewhere after the workshop. The workshop will be held on August 1,
2003 in Toronto.
See for more information.
News, 6/4/03:
Five papers on NLP from the ISMB 2003 meeting, June 29 - July 3
are now posted on the BioNLP site in PDF format
via this page.
News, 4/17/03:
CALL FOR PAPERS - Submit abstracts by May 15, 2003.
BioLINK has announced the meeting of the
Special Interest Group in Text Mining at this year's ISMB:
BioLINK Text Data Mining SIG: Biology Literature, Information and Knowledge
at ISMB 2003, Brisbane, Australia
Friday, June 27, 2003 9:00 - 17:30
For details see this
BioNLP list archive item.
News, 4/9/03:
BioLINK: Biological Literature, INformation and Knowledge
Mailing list and website
From their website, --
"The Special Interest Group on Text Mining (or BioLINK) was
created to address the need of communication and
interchange of ideas in the field of text mining and
information extraction applied to biology and biomedicine...."
Go there to see more details, list of organizers of the group,
online papers, etc. There is also a mailing list. Send inquiries
about the mailing list to
News, 4/8/03:
The SIGIR'03 Workshop on Text Analysis for Bioinformatics will
be held August 1st, in Toronto, Canada. Paper submission by June 16th.
Click for details in the BioNLP archive.
News, 3/13/03:
The IEEE Computer Society Bioinformatics is looking for papers on NLP
in Biology. Paper deadline is coming up soon, April 1, but there is
a May 22 deadline for Poster Abstracts that will be published in the Proceedings.
The conference will be held at Stanford, August 11-14, 2003.
More information here: And here is a
two-page PDF version of the call for papers.
News, 12/24/2002:
TREC2003, the Text Retrieval Conference, has a Genomics track.
Click for details in the BioNLP archive
News, 12/23/2002:
A very useful new review paper on biology text data mining has just been
published by Hirschman, et al. The citation, abstract and references
are available here.
Motivation for this site
The literature of the field of biology is the largest of all the sciences.
The volume of biology literature each year, measured in bytes, is about fifty
times the size of the entire human genome, junk and all. But locked in this literature
is an enormous amount of information that can tell us much about the structure
and function of genes, proteins, cells and organisms -- how they work as well
as how they can fail.
The newly emergent interest in natural language processing for biology has
been christened "Information Extraction". But work in this area has been going
on for many decades under different names and this site includes a good
deal of information about past and current work in NLP and in information extraction
for biology in particular. The other major descriptor of the general field
is "Computational Linguistics".
The goals for this site include providing material and links in the following areas:
Introductions to NLP, including texts, papers and FAQs
Biological corpora, e.g., Medline, electronic journals
NLP databases such as lexicons and grammars
NLP tools such as pattern matchers, taggers and parsers
Advanced topics such as statistical approaches and machine learning
Meeting and workshop information
NLP preprints and reprints
Biology-specific NLP such as word lists, statistics of bio text
Research groups in NLP and biological NLP in particular
Development of a mailing list and archive for people interested in BioNLP
Activities in this community could include:
Hosting workshops
Developing sessions in larger meetings
Exchanging researchers between research groups
Developing performance measures, test materials and competitions
The site was created by Bob Futrelle, February 27, 2001.
Earlier News (as of 11/28/2002)
News, 12/20/2002:
Computational linguist, Daniel Jurafsky, received a
MacArthur "Genius Award" in 2002. Though Dan focuses primarily on speech,
it's nice to know that one of our own has been so highly honored.
His book with Martin is listed on our Books and Journals page.
A challenge --
BioNLP is not easy (by RPF 11/02)
News, 11/28/2002:
PSB 2003
Linking Biomedical Language, Information and Knowledge, January 3-7, 2003.
Papers now online.
More news, 11/28/2002:
ACL 2002
Workshop on Natural Language Processing in the Biomedical Domain.
Papers now online.
11/28/2002: There will be a special session at PSB 2003,
"Linking Biomedical Language, Information and Knowledge".
The session is part of the Pacific Symposium on Biocomputing 2003
January 3-7, 2003
Kauai Marriott Resort and Beach Club.
Here are online copies the introductory paper and all
six session papers.
There was a Workshop on Natural Language Processing in the Biomedical Domain
at ACL 2002 in Philadelphia. I have placed a
mirror of the web pages for the workshop here
which includes online copies of the twelve papers, in PDF and Postscript formats.
Be warned that some of the links there are not operational, since I
have not copied the entire ACL CD contents to the site(!).
11/28/2002: There was a text mining workshop at ISMB 2002 in Edmonton,
Alberta, Canada on August 2nd, 2002.
Here is the initial announcement.
When the workshop has its own page, or I can otherwise get copies of
or links to the papers, there'll be a link here.
Archives of even earlier News - Archives.
CONTRIBUTIONS: Send me your papers and reports or links to them.
This site will improve primarily by the collection of contributions from researchers
and practitioners from around the world. I would be happy to add links
to any on-line papers and reports you have or are aware of
or cache them on this site for easy access. Any links to other resources would
also be most welcome.