How to proceed

 

First, set our goals. The basic one should be to answer questions from users. Enabling goals include database and tool development.

To study the data, we need to build visualization tools, just as we do for biology. Such tools are poorly developed in the NLP community, which is more focused on Unix pattern-matching, parsing, and statistical/learning algorithms.

A visualization example might be highlighting and presenting noun phrases describing proteins, in context, and interactive so that linguistic and biological categorization can be revealed and edited.

Using such tools, we can steadily assemble a large collection of the standard textual expression forms used to present biolgical content. Then we can map these onto the query forms for which they are the answers.