The earliest inklings
I am a bibliophile - someone with a deep and abiding interest in books (and journals). See my biography, the early years. Even as an undergraduate in physics at MIT, and especially as a PhD student there, I had personal subscriptions to some major phyics, and chemical physics journals. All this grew out of my desire to understand my potential career - what was it that physicists did in their daily work and research? Physics Today, a popular, but specialist magazine, was my primary guide, along with the good books I found in the (open) Reserve stacks.
Awakening - A summer at the Marine Biological Laboratory
Fueled by an NSF biology research grant to me at the University of Illinois, my family and I were able to spend two full summers at the Marine Biological Laboratory (MBL) in Woods Hole, MA, in 1980 and 1981. In the second summer someone had arranged a speaker series of luminaries to address issues of biological research. The most notable presence was Eugene Garfield, who developed the famous Science Citation Index. I had some good conversations with him.
A major epiphany occurred during that second summer at the MBL. After attending a small symposium there on the future of digital libraries, I realized that an excellent goal for research would be to use computers, not for lab data acquistion, but to mine the huge collection of knowledge already residing in the public literature. As a biologist, I knew that the text and the figures in the literature, taken together, were what needed to be analyzed. So my work on this started more than ten years before the first widely-used browser, Mosaic, was developed.
The awakening continued as I reflected on the use of computers in biology, something I had been heavily involved in since 1972, at the University of Colorado, and then in my lab at Illinois from 1975-1985. What I realized was that the primary use of computers at that time was to gather and process data in the laboratory. But I realized that a huge amount of "data" was out there, begging for the application of computers. That data was the content of the entire biological literature, summarizing literally millions of experiments done over the years. Because of my earlier research in linguistics I could see those two streams coming together - the literature and the application of computers to extract knowledge from the literature. Because I approached the problem from linguistics, and because I was a working scientist, a biologist, I thought of this problem in the large, not in the extraction of specific items such as genes, proteins, and their interactions - very much the focus of current work in BioNLP.
I have always been interested in graphics and design. The first major professional effort in this domain was the Galatea system at the University of Chicago, which I designed and implemented during 1973 to 1975 (with substantial help from graduate students, Potel and Sayre). It overlayed a movie film image on an interactive graphics image. The system was used to study microcinematography images of the cellular slime mold, Dictyostelium discoideum. The centers of the cells could be followed for overall motion and even the shapes could be outlined, on a frame-frame-basis. Probably the most important result of all this work came later, in my paper: Futrelle, R. P., Traut, J., & McKee, G. W. (1982). Cell Behavior in Dictyostelium discoideum: Aggregation responses to localized cyclic AMP pulses. J. Cell Biology, 92, 807-821. This described work done in my lab at the University of Illinois in Urbana-Champaign.
To pursue all this, I shut down my biology lab at Illinois and joined the College of Computer Science at Northeastern University in early 1986, retiring in 2011. I was able to obtain a large research grant from the NSF in 1989. This established the Biological Knowledge Laboratory (BKL), which I headed for twenty-two years. There was a lot of work to do, because full-text papers and their figures were not readily available in electronic form. So we scanned figures and had an assistant trace over them, producing electronic versions that we could then analyze (Nikolakis' PhD research).