Latest update, May 2007
This collection of PDF documents (one still being converted) and a web site, comprises virtually all my published work on diagrams to date. Over the years, I have made it a point to explore a variety of aspects of diagrams, including, foremost, building working diagram parsing systems, but also exploring ambiguity, summarization, and text-graphic relations. In particular, text-graphics relations are discussed in papers 8, 9, and 11. The single paper containing the most comprehensive view of my work (to 1999) is the summarization chapter, paper 8. Our landmark paper announcing our comprehensive working system for diagram parsing is paper 5, with Nikolakis. The most recent document here is 15, a technical report that encapsulates and slightly revises, the website, document 6 below.
The most important document missing from this collection is Nikos Nikolakis' PhD dissertation, January 1997, Diagram analysis using equivalence and constraints, 202 pages. It has proved difficult to update the electronic copy of the thesis to current MS Word format, a task that includes recreating the bibliography, all citations, all equations, and all figures. But it is possible and will be done, so it can be hosted in PDF form for free distribution. It contains much important unpublished material and results. Xeroxed copies are available to anyone not willing to wait for the PDF.
1. Futrelle, R. P. (1985). A Framework For Understanding Graphics In Technical Documents. In Expert Systems in Government Symposium (pp. 386-390): IEEE Computer Society.
Comment: This was my first published work on diagram understanding. The work was done while I was still on the Biology faculty at Illinois, where I did research at the bench, taught, wrote, and read the Biology literature, so I was keenly aware of just how important diagrams are to doing and understanding Biology. (I had published a few years earlier on the text aspects of knowledge in the Biology literature.) Important parts of the paper described concepts and strategies that were on the mark and are still in use in our work today. The idea of first finding "high-confidence" elements is related, but not identical to, the "background" elements we introduced later, what Kosslyn has called "frameworks". An examination of the grammar rules on our 1998 website (URL below) shows for example, that we first find the long perpendicular scale lines in data graphs, the "high-confidence" elements, and then build the rest of the parse around them. The paper discussed visually interfering elements, overlaps of various sorts, that would be and still are problems in the vectorization of raster images. Our current work (2006) focuses on vector-based diagrams, so such interference is rarely a problem. The paper also presents an early view of constraint grammars, which came to maturity in our lab over the next ten years (again, see the website referenced below).
PDF not available as yet. I will have to create a scanned version of the paper.
Abstract: Many technical documents (reports, articles, and books) depend heavily on graphics to convey important information. Document understanding systems must be able to deal with such graphics. Using the ubiquitous x,y plot as an example, we can describe an expert system under development which can understand graphics -- one that can extract the data values, error bar values, axis labels, and the like. The class of x,y plots is described as a fuzzy relational structure in which components (data points, scale lines, ,...) are built up from graphical elements (lines, arcs) by a set of production rules. The rules attempt to find simple or isolated elements first, these are high-confidence elements. Complex components, e.g., involving occluding elements, are then discovered using the high-confidence elements as a base.
2. Futrelle, R. P. (1990). Strategies for Diagram Understanding: Object/Spatial Data Structures, Animate Vision and Generalized Equivalence. In 10th ICPR (pp. 403-408): IEEE Press.
Comment: This paper was important for its introduction of Generalized Equivalence Relations (GERs). The most notable being Near. It also introduced pyramidal spatial indexes for efficiently computing GERs. Both of these concepts have endured and are embedded in our current systems. This is the first of our papers to mention our implementation of the system.
Abstract: Virtually all scientific documents contain informational diagrams. To build knowledge bases representing this literature, the diagrams must be semantically analyzed. In our system, diagrams are represented as collections of geometric objects. The fundamental organizing principle for these objects is the Generalized Equivalence Relation (GER). Examples include Near, Parallel, and Aligned. These relations can be computed efficiently using GOSSAMER, a pyramidal data structure that allows spatially associative access to objects. Animate Vision (AV), in which the image is scanned either continuously or discontinuously, is used to mimic the efficient strategy used by humans to view diagrams. The system is implemented in Common Lisp and is being applied to data graphs and gene diagrams in the biological literature.
3. Futrelle, R. P. (1992). The Conversion of Diagrams to Knowledge Bases. In IEEE Workshop on Visual Languages (pp. 240-242). Seattle, WA: IEEE Computer Society Press.
Comment: This short paper and the one that follows introduced Graphics Constraint Grammars (later called Context-based Constraint Grammars). Diagram organization can be analyzed for visual salience, domain conventions, and pragmatics. An example of a grammar rule is given including propagators, which are essentially Knuth's synthesized attributes.
Abstract: If future electronic documents are to be truly useful, we must devise ways to automatically turn them into knowledge bases. In particular, we must be able to do this for diagrams. This paper discusses biological diagrams. We describe the three major aspects of diagrams: visual salience, domain conventions and pragmatics. We next describe the organization of diagrams into informational and substrate components. The latter are typically collections of objects related by Generalized Equivalence Relations. To analyze diagrams, we define Graphics Constraint Grammars (GCGs) that can be used for both syntactic and semantic analysis. Each grammar rule describes a rule object and consists of a Production, describing the constituents of the object, Constraints that must hold between the constituents, and Propagators that build properties of the rule object from the constituents. We discuss how a mix of parsing and constraint satisfaction techniques are used to parse diagrams with GCGs.
4. Futrelle, R. P., Kakadiaris, I. A., Alexander, J., Carriero, C. M., Nikolakis, N., & Futrelle, J. M. (1992). Understanding Diagrams in Technical Documents. IEEE Computer, 25(7), (pp. 75-78).
Comment: This paper, published in the same year as the preceding one, covers similar ground, but with a larger grammar example and a description of how efficient parsing is aided by the spatial index.
Abstract: The ultimate goal of document analysis is to go from hardcopies to a computer-based knowledge bank representing the documents' contents. In technical documents, diagrams often play a critical role. An approach is described which can analyze diagrams to yield structural descriptions for them. The approach combines grammatical and constraint-based techniques in a single scheme called Graphics Constraint Grammars. The most important constraints are the Generalized Equivalence Relations which efficiently recognize conceptual groupings of objects.
5. Futrelle, R. P., & Nikolakis, N. (1995). Efficient Analysis of Complex Diagrams using Constraint-Based Parsing. In ICDAR-95 (Intl. Conf. on Document Analysis & Recognition) (pp. 782-790). Montreal, Canada.
Comment: This is the single most important paper of our initial work on diagram understanding. It is a short summary of the research that went on to form Nikolakis' PhD dissertation (1997). By now we had a working parser and full-size grammars for data graphs, gene diagrams, and finite state automata. Note that the Mac used was a 25MHz machine, so in principle, parsing a diagram with today's machines should take well under 1 second.
Abstract: This paper describes substantial advances in the analysis (parsing) of diagrams using constraint grammars. The addition of set types to the grammar and spatial indexing of the data make it possible to efficiently parse real diagrams of substantial complexity. The system is probably the first to demonstrate efficient diagram parsing using grammars that easily be retargeted to other domains. The work assumes that the diagrams are available as a flat collection of graphics primitives: lines, polygons, circles, Bezier curves, and text. This is appropriate for future electronic documents or for vectorized diagrams converted from scanned images. The classes of diagrams that we have analyzed include x,y data graphs and genetic diagrams drawn from the biological literature, as well as finite state automata diagrams {states and arcs). As an example, parsing a four-part data graph composed of 133 primitives required 35 sec using Macintosh Common Lisp on a Macintosh Quadra 700.
6. Futrelle, R. P. (1998). . The Diagram Understanding System Demonstration Site.
A technical report which brings together all the pages in that site is available here for download as a single PDF file.
Comment: This site is useful and informative because it describes our operational diagram understanding system, DUS, that was finished by Nikolakis in my lab in 1997. It shows screen shots of the working system as viewed by the DUS Inspector, DUSI. The DUSI was able to show the correspondence between the objects in the parse tree, from the root to the leaves, and the graphical elements in the displayed diagrams. Complete and operational grammars are listed for x,y data graphs, linear gene diagrams, and finite-state automata graphs. Examples of diagrams that were parsed are shown along with the times to build the spatial indexes and do the subsequent parsing. Our current work is redeveloping and expanding the entire system in Java and applying it to a large corpus of vector-based diagrams. A major difference in the new system is the integration of textual information, such as captions, with the goal of building rich browsing and retrieval systems for document collections. Note: This website has been packaged up and revised to create a tech report, a single PDF document, item 15 below.
7. Futrelle, R. P. (1999). Ambiguity in Visual Language Theory and its Role in Diagram Parsing. In IEEE Symposium on Visual Languages, VL99 (pp. 172-175). Tokyo: IEEE Computer Soc.
Comment: I felt that it was important to explore a variety of topics that affect communication with the reader via diagrams. One of these is ambiguity. This paper describes classes of ambiguities: lexical (introducing the term grapheme, repurposed for diagrams), attachment, and analytical ambiguities, including role determination, composition, and occlusion. This paper was later discussed by M. Shilman, et al, for sketch recognition (2002) who added label and attribute ambiguities, useful for sketches in particular. See also the paper by Costagliola, et al, VLHCC2004, which mentions our structural ambiguities (in contrast to lexical ones).
Abstract: To take advantage of the ever-increasing volume of diagrams in electronic form, it is crucial that we have methods for parsing diagrams. Once a structured, content-based description is built for a diagram, it can be indexed for search, retrieval, and use. Whenever broad-coverage grammars are built to parse a wide range of objects, whether natural language or diagrams, the grammars will overgenerate, giving multiple parses. This is the ambiguity problem. This paper discusses the types of ambiguities that can arise in diagram parsing, as well as techniques to avoid or resolve them. One class of ambiguity is attachment, e.g., the determination of what graphic object is labeled by a text item. Two classes of ambiguities are unique to diagrams: segmentation and occlusion. Examples of segmentation ambiguities include the use of a portion of a single line as an entity itself. Occlusion ambiguities can be difficult to analyze if occlusion is deliberately used to create a novel object from its components. The paper uses our context-based constraint grammars to describe the origin and resolution of ambiguities. It assumes that diagrams are available as vector graphics, not bitmaps.
8. Futrelle, R. P. (1999). Summarization of Diagrams in Documents. In I. Mani & M. Maybury (Eds.), Advances in Automated Text Summarization (pp. 403-421). Cambridge, MA: MIT Press.
Comment: Another aspect of diagrams that needed exploration was summarization, something studied for text beginning with Luhn's work at IBM in the late 1950s (see his paper in the book containing my chapter). My nineteen page chapter explored a variety of issues that arise in attempting to summarize a diagram or a set of diagrams. The four examples discussed in the paper involve, collapsing sequences, deleting subtrees, merging, and cueing on layout. No implementation of these ideas was attempted, though such a system could be implemented in the future, building on collections of parsed diagrams. Sec. 6 contains an extended discussion about the relations between diagrams and text.
Abstract: Documents are composed of text and graphics. There is substantial work on automated text summarization but almost none on the automated summarization of graphics. Four examples of diagrams from the scientific literature are used to indicate the problems and possible solutions: a table of images, a flow chart, a set of x,y data plots, and a block diagram. Manual summaries are first constructed. Two sources of information are used to guide summarization. The first is the internal structure of the diagram itself, its topology and geometry. The other is the text in captions, running text, and within diagrams. The diagram structure can be generated using the author's constraint-based diagram understanding system. Once the structure is obtained, procedures such as table element elision or subgraph deletion are used to produce a simpler summary form. Then automated layout algorithms can be used to generate the summary diagram. Current work on parsing and automated layout are briefly reviewed. Because automated diagram summarization is a brand-new area of inquiry, only the parsing phase of the approach has been fully implemented. Because of the complexity of the problem, there will be no single approach to summarization that will apply to all kinds of diagrams.
9. Futrelle, R. P., & Rumshisky, A. (2001). Discourse Structure of Text-Graphics Documents. In 1st International Symposium on Smart Graphics (pp. 31-38). Hawthorne, NY: ACM.
Comment: Informational figures are rarely found without accompanying text. This paper discusses this dual structure. Anaphoric phenomena in natural language have to expand to deictic references between the text and the objects, structures and regions in the graphics. Reading order has to expand to excursions between the text and the graphics. We used the Restricted Focus Viewer (Blackwell, et al, 2000) to track these excursions in a problem-solving situation (not mentioned in the abstract). Denotation must be expanded to include graphical structures. Recently we explored the examples such as that in Figure 1 using both Prolog and Otter to prove assertions that go beyond ones that can be asserted from the text or the graphics separately. This demonstrates the necessity of the two modalities.
Abstract: In order to analyze, generate and use documents containing both text and graphics, it is important to have a theory of their structure. We argue that it is possible to develop a semantics for graphics, as well as text, and generate an integrated representation of text/graphics discourse, building on previous theories of text discourse. A major component of our theory is an integrated natural language / visual language lexicon that allows people to understand bimodal discourse. Another major component of text/graphics discourse is the role of the reader. That is, the reader constantly exercises choices to shift his or her attention between the text and graphics while reading/viewing. We demonstrate some computational approaches to the automation of building discourse structure at the level of syntax. This is followed by the construction of semantics via logical forms. The result of this work is a Text-Graphics Discourse Theory, TGDT.
10. Futrelle, R. P., Shao, M., Cieslik, C., & Grimes, A. E. (2003). Extraction, layout analysis and classification of diagrams in PDF documents. In ICDAR 2003 (Intl. Conf. Document Analysis & Recognition) (pp. 1007-1014). Edinburgh, Scotland: IEEE Computer Society.
Comment: The abstract below covers the paper well. Mingyan Shao's current dissertation research in my lab is expanding on the topics in the paper, using a large corpus of PDF files. Her new work uses shallow parsing, in contrast to the full root-to-leaves parsing of our 1965-1998 DUS.
Abstract: Diagrams are a critical part of virtually all scientific and technical documents. Analyzing diagrams will be important for building comprehensive document retrieval systems. This paper focuses on the extraction and classification of diagrams from PDF documents. We study diagrams available in vector (not raster) format in online research papers.
PDF files are parsed and their vector graphics components installed in a spatial index. Subdiagrams are found by analyzing white space gaps. A set of statistics is generated for each diagram, e.g., the number of horizontal lines and vertical lines. The statistics form a feature vector description of the diagram. The vectors are used in a kernel-based machine learning system (Support Vector Machine). Separating a set of bar graphs from non-bar-graphs gathered from 20,000 biology research papers gave a classification accuracy of 91.7%. The approach is directly applicable to diagrams vectorized from images.
11. Futrelle, R. P. (2004). Handling Figures in Document Summarization. In Text Summarization Branches Out. Workshop at ACL 2004 (pp. 61-65). Barcelona, Spain: Association for Computational Linguistics.
Comment: The slides for the talk give a quick and easy overview of the material. There is still little work on summarization of figures. It is approximately in the same state as described in my 1999 chapter. We will not be able to build serious systems for diagram summarization until we or others produce a substantial corpus of parsed diagrams. It will then be possible to do a modest amount of summarization; beyond that some level of semantic analysis will be needed. The paper includes a reference to my still modest diagrams site: http://www.diagrams.org
PDF of paper. PDF of talk slides.
Extended Abstract: The summarization of a document should take into account the content of its figures. Research papers in Biology, e.g., in Science or Nature, devote approximately 50% of their content to the figures, including captions and discussions in the running text. Summarization of the text that accompanies figures is not adequate to properly summarize the content of the figures. This paper focuses on diagrams that appear in documents rather than photographic images. We first point out systems in which diagrams, with their captions, are used as the document summary. The technique of extraction can be applied to choosing individual diagrams or choosing portions of diagrams as parts of a summary. Thumbnail images are another way to summarize large images and represent a method unique to the graphics domain. We discuss the properties of figure-related text within figures, in captions and in running body text. Portions of diagrams can be merged within or across diagrams to generate summary composites. One of the outstanding technical problems that must first be dealt with before large-scale diagram summarization can become a reality, is the "raster impasse", the fact that an overwhelming majority of all figures in electronic documents are only available in raster format.
12. Futrelle, R. P. (2004). Diagram Schemas: What, Why, How. In A. Blackwell, K. Marriott & A. Shimojima (Eds.), Diagrammatic Representation and Inference: Third International Conference, Diagrams 2004 (Vol. LNCS 2980, pp. 231-234). Cambridge, UK: Springer-Verlag.
Comment: The published paper was not all it could have been. By the time of the meeting, the presentation was substantially improved in the colorful poster, the PDF linked below. The concepts touched on in the poster include Categorization, Ontology, Instance, Exemplar, Prototype, Family resemblance, Central, and Intensional and Extensional definitions. I found it useful to structure the paper and poster as I did, in terms of What, Why, and How.
Abstract (of the poster): Diagrams play a critical role in recording and communicating information that is difficult to express or comprehend using only text. In contemporary biology research papers, figures account for approximately 50% of the content, amazingly enough. (This counts the space occupied by the figures per se as well as the text of the captions and the discussion of the figures in the body of the paper.) Though there are many systems for indexing and searching text content, there is virtually nothing available today for figures.
This poster looks at several aspects of diagrams, particularly the categorization problem. Categorization must be understood in order to properly understand and design useful diagram-related systems. This poster revises some of the material in the corresponding paper in the Diagrams 2004 proceedings.
12A. Mingyan Shao & Robert P. Futrelle (2005). Moment-derived Object Models for Vectorization. in MVA2005. In IAPR Conference on Machine Vision Applications May 16-18, 2005. Tsukuba Science City, Japan.
Comment: This paper reports a novel and partially successful approach to line recognition in raster images, using a method of moments of the gray-level distribution in small regions. Its limited success was due to the fact that it required the use of regions that could extend well past the line pixels themselves, so it could more easily fail in crowded situations, unlike highly localized algorithms such as thinning. In future work, we will retain all the file-handling and GUI superstructure and simply replace the underlying line-finding "engine" by a more conventional (and proven successful) set of techniques. Much of the work on the system was done by Dan Crispell, now at Brown University. We intend to include him as a co-author in at least one future paper (that are more successful at their task!)
Abstract: Various methods for the vectorization of line drawings have been developed in the past. The predominant ones are based on skeletonization/ thinning, edge detection and sparse pixel techniques. This paper describes a new approach to vectorization based on using spatial moment analysis of gray levels in k x k regions around each pixel to generate parameters for object models. The line model, in particular, consists of a core and two wing regions which are initially determined by a principal component analysis of the moments. The model is refined by fitting to the pixel gray levels, minimizing the sum of the core and wing errors. This study uses two evaluation procedures: In the first, vector parameters from a ground truth model are compared to the fitted models; in the second, pixel statistics for the difference between the images of the ground truth model and the fitted model are evaluated. The results show that our approach can produce excellent results for the vector-derived diagrams common in scientific papers, the class of diagrams our group works on.
13. Shao, M., & Futrelle, R. P. (2005). Graphics Recognition in PDF Documents. In Sixth IAPR International Workshop on Graphics Recognition (GREC 2005) (pp. 218-227). Hong Kong.
Comment: Graphemes, introduced in my 1999 paper, are used here to classify figures. This 2005 paper organizes and extends the work discussed in our 2003 paper. One of the major improvements is the use of multi-class boosting techniques, rather that support vector machines for the machine learning tasks. The results achieved impressed even us! The final results, in much more detail, will appear in Shao's thesis which we hope (she hopes!) will be completed by the end of 2006 or early in 2007. The PDF and accompanying abstract are for a revised and as yet unpublished version of the conference paper.
Abstract: Much of the work on graphics recognition for raster-based input focuses on accurate and rapid discovery of primitives such as lines, arrowheads, and circles. Some of the work goes further to discover higher-level objects such as circuit elements, pie-chart components, and architectural components. Some systems work to discover even higher levels of organization. This paper focuses on graphics recognition of figures in vector-based PDF documents. The first stage consists of extracting the graphic and text primitives corresponding to figures. The mapping from PDF to the rendered page can be complex, so an interpreter was constructed to translate the PDF content into a set of self-contained graphics and text objects, freed from the intricacies of the original PDF file. The second stage consists of discovering simple graphics entities which we call graphemes, the simplest being a pair of primitive graphic objects satisfying certain geometric constraints. The third stage uses machine learning to classify figures using grapheme statistics as descriptive attributes. Figure recognition can then be accomplished by applying a classifier so trained to the attribute set of a figure to be recognized (classified). The system is implemented in Java. In the study reported here, a boosting-based learner (LogitBoost in the Weka toolkit) was able to achieve 100% classification accuracy in hold-out-one training/testing using 16 grapheme types extracted from 36 diagrammatic figures from BioMed Central research papers. The approach can readily be adapted to raster graphics recognition; once primitives are found, graphemes can be constructed from them and used in the learning-based recognition and classification system.
14. Shao, M. & Futrelle, R. P. (2006). Recognition and Classification of Figures in PDF Documents. In W. Liu and J. Lladós (Eds.): Selected papers from GREC 2005, LNCS 3926, pp. 231-242 (Springer, 2006).
Comment: This is a revised and updated version of the GREC 2005 conference paper, item 13 above.
Abstract: Graphics recognition for raster-based input discovers primitives such as lines, arrowheads, and circles. This paper focuses on graphics recognition of figures in vector-based PDF documents. The first stage consists of extracting the graphic and text primitives corresponding to figures. An interpreter was con- structed to translate PDF content into a set of self-contained graphics and text objects (in Java), freed from the intricacies of the PDF file. The second stage consists of discovering simple graphics entities which we call graphemes, e.g., a pair of primitive graphic objects satisfying certain geometric constraints. The third stage uses machine learning to classify figures using grapheme statistics as attributes. A boosting-based learner (LogitBoost in the Weka toolkit) was able to achieve 100% classification accuracy in hold-out-one training/testing using 16 grapheme types extracted from 36 figures from BioMed Central journal research papers. The approach can readily be adapted to raster graphics recognition.
15. Futrelle, R. P. (2007). The Diagram Understanding System - Strategies and Results. Technical report, May 2007.
Comment: This report packages up and revises the website (item 6 above) as a single 38 page PDF document. It is an important document because it clearly shows, along with our 1995 paper, item 5, just how much we accomplished in that earlier period. This report is being reworked as a full paper for journal publication. In the full paper I will discuss just how our earlier work compares to work at that time and to work by others since then, up to the present. Many people were unaware or our earlier work and have re-invented a number of the techniques we had already developed and implemented in 1995-96, not to mention our other work on diagrams extending back to 1985. The report has fifteen figures, mostly screenshots, as well as the full grammars used to parse the examples.
Abstract: Strategies for syntactic parsing of parsing diagrams are explained. The basic strategy is the use of Context-based Constraint Grammars to express and guide the parsing process, as well as spatial indexing that allows the system to evaluate spatial constraint predicates rapidly. It is notable that the system described here was complete and operational in 1996, which made it possibly the first such system to fully parse a variety of actual diagrams drawn from the research literature, notably x,y data graphs and linear gene diagrams, as well as finite-state automata diagrams created for the project.