David A. Smith

David on the slope Associate Professor, Khoury College of Computer Sciences, Northeastern University
440 Huntington Avenue
West Village H, Room 356
Boston, MA 02115
Phone: (617) 373-8526
dasmithATSIGNccs.neu.edu

Office hours: by appointment

With colleagues in CS, Social Sciences, and Humanities, I am a founding member of the NULab for Texts, Maps, and Networks, Northeastern's research center for the digital humanities and computational social sciences. My research focus is on natural language processing and computational linguistics, with applications to machine translation, information retrieval, the social sciences, and humanities.

Recent news: The Mellon Foundation has funded the University of Illinois, Washington University in St. Louis, and Northeastern University to study the spread of news about racial terror and anti-Black violence.

The Mellon Foundation funded Northeastern to craft a research agenda for historical and multilingual OCR in the humanities. Read our final report, with recommendations for computer science and humanities researchers, libraries, and funders. This led to follow-up work on Arabic-script OCR funded by the Mellon and NEH.

The Institute for Museum and Library Sciences and funding agencies in five other countries have granted $1.2M to our new Oceanic Exchanges project to track news and ideas across countries and languages.

Our NEH-funded Viral Texts project on viral networks in 19th-century newspapers was featured in Wired. Our work analyzing text reuse in bills in the US Congress was featured in the Economist.

Until August 2012, I was a Research Assistant Professor in the Department of Computer Science at the University of Massachusetts, Amherst.

Formerly: Natural Language Processing at Johns Hopkins University; and Head Programmer, Perseus Project, Tufts University

See also my curriculum vitae in PDF.

Graduate Students

Ryan Muther, Shijia Liu, Si Wu, and Caroline Craig.

Alumni

Liwen Hou (postdoc, Harvard)

Rui Dong (Amazon)

Ansel MacLaughlin (Amazon)

Shaobin Xu (Google)

Kriste Krstovski (Data Science Institute and Business School, Columbia University)

Jason Naradowsky (postdoc, UCL & Cambridge)

Teaching

Spring 2021, Spring and Fall 2020: Natural Language Processing (CS6120)

Fall 2019–2023, Spring 2021: Information Retrieval (CS6200/IS4200)

Fall 2017: Special Topics in AI: Text Modeing for the Humanities and Social Sciences (CS7180): Tu 11:45–1:25, Th 2:50–4:30

Spring 2016–2017: Natural Language Processing (CS6120)

Fall 2012-2015: Information Retrieval (CS6200)

Spring 2013-2015: Natural Language Processing (CS6120)

Spring 2012: Search Engines (CS 446)

Fall 2011: Residential Academic Program First-Year Seminar (CS 191a)

Fall 2009: Introduction to Natural Language Processing (CS 585)

Spring 2009: James Allan, R. Manmatha, and I led a seminar on Mining Text and Images in Digital Libraries Using Grid Computing.

August 2006: Charles Schafer and I presented a tutorial, Overview of Statistical Machine Translation [pdf], at the Association for Machine Translation in the Americas.

Fall 2005: Noah Smith and I designed and taught a course on Empirical Research Methods in Computer Science.

Book

Ryan Cordell, David A. Smith, Abby Mullen, and Jonathan D. Fitzgerald. Going the Rounds: Virality in Nineteenth-Century Newspapers. University of Minnesota Press, forthcoming 2022, print and open-access e-book and online.

Refereed Conference & Journal Publications

Jacob Murel and David A. Smith. Retrieving and analyzing translations of American newspaper comics with visual evidence. In Workshop on coMics ANalysis, Processing and Understanding (MANPU), 2024.

Danlu Chen, Jacob Murel, Taimoor Shahid, Xiang Zhang, Jonathan Parkes Allen, Taylor Berg-Kirkpatrick, , and David A. Smith. MONSTERMASH: Multidirectional, overlapping, nested, spiral text extraction for recognition models of Arabic-script handwriting. In International Workshop on Computational Paleography, 2024. [ PDF ]

Jaydeep Borkar and David A. Smith. Mind the gap: Analyzing lacunae with transformer-based transcription. In International Workshop on Computational Paleography, 2024.

Jacob Murel and David A. Smith. Active learning with relevance feedback for handwriting detection in historical print. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2024.

Liwen Hou and David A. Smith. Detecting syntactic change with pre-trained transformer models. In Findings of EMNLP, 2023. [ PDF ]

David A. Smith, Jacob Murel, Jonathan Parkes Allen, and Matthew Thomas Miller. Automatic collation for diversifying corpora: Commonly copied texts as distant supervision for handwritten text recognition. In Computational Humanities Research Conference (CHR), 2023. [ PDF ]

Ryan Muther, Mathew Barber, and David A. Smith. Querying the past: Automatic source attribution with language models. In Computational Humanities Research Conference (CHR), 2023. [ PDF ]

Caroline Craig, Kartik Goyal, Gregory R. Crane, Farnoosh Shamsian, and David A. Smith. Testing the limits of neural sentence alignment models on classical Greek and Latin texts and translations. In Computational Humanities Research Conference (CHR), 2023. [ PDF ]

Ryan Muther and David A. Smith. Citations as queries: Source attribution using language models as rerankers. In SIGIR Workshop on Retrieval-Enhanced Machine Learning (REML), 2023. [ PDF ]

Si Wu and David A. Smith. Composition and deformance: Measuring imageability with a text-to-image model. In Proceedings of the Workshop on Narrative Understanding, 2023. [ PDF ]

Giulia Taurino and David A. Smith. Machine learning as an archival science: Narratives behind artificial intelligence, cultural data, and archival remediation. In NeurIPS Workshop on AI Cultures, 2022. [ PDF ]

Ryan Muther, David A. Smith, and Sarah Bowen Savant. From networks to named entities and back again: Exploring classical Arabic isnad networks. Journal of Historical Network Research, 2022. [ PDF ]

Ansel MacLaughlin, Shaobin Xu, and David A. Smith. Recovering lexically and semantically reused texts. In Proceedings of the Joint Conference on Lexical and Computational Semantics (*SEM), 2021. [ PDF ]

Alejandro Toselli, Si Wu, and David A. Smith. Digital editions as distant supervision for layout analysis of printed books. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2021. [ PDF ]

Helen O'Neill, Anne Welsh, David A. Smith, Glenn Roe, and Melissa Terras. Text mining Mill: Computationally detecting influence in the writings of John Stuart Mill from library records. Digital Scholarship in the Humanities, 2021.

Rui Dong and David A. Smith. Structural encoding and pre-training matter: Adapting BERT for table-based fact verification. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021. [ PDF ]

Ansel MacLaughlin and David A. Smith. Content-based models of quotation. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021. [ PDF ]

Liwen Hou and David A. Smith. Drivers of English syntactic change in the Canadian Parliament. In Proceedings of the Society for Computation in Linguistics (SCiL), 2021. [ PDF ]

Liwen Hou and David A. Smith. Emerging English transitives over the last two centuries. In Proceedings of the Society for Computation in Linguistics (SCiL), 2021. [ PDF ]

Ansel MacLaughlin, John Wihbey, Aleszu Bajak, and David A. Smith. Source attribution: Recovering the press releases behind science health news. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM), 2020. [ PDF ]

Maha Alkhairy, Afshan Jafri, and David A. Smith. Finite state machine pattern-root Arabic morphological generator, analyzer and diacritizer. In Proceedings of the Language Resources and Evaluation Conference (LREC), 2020.

Shijia Liu and David A. Smith. Detecting de minimis code-switching in historical German books. In Proceedings of the International Conference on Computational Linguistics (COLING), 2020. [ PDF ]

Rui Dong, David A. Smith, Shiran Dudy, and Steven Bedrick. Noisy neural language modeling for typing prediction in BCI communication. In Proceedings of the Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), pages 44–51, 2019. [ PDF ]

Rui Dong and David A. Smith. Multi-input attention for unsupervised OCR correction. In Proceedings of the Association for Computational Linguistics, 2018. [ PDF ]

Ansel MacLaughlin, John Wihbey, and David A. Smith. Predicting news coverage of scientific articles. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM), 2018. [ PDF ]

Shiran Dudy, Steven Bedrick, Shaobin Xu, and David A. Smith. A multi-context character prediction model for a brain-computer interface. In Proceedings of the Workshop on Subword and Character Level Models in NLP (SCLeM), 2018. [ PDF ]

Shaobin Xu and David A. Smith. Contrastive training for models of information cascades. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018. [ PDF ]

Liwen Hou and David A. Smith. Modeling the decline in English passivization. In Proceedings of the Society for Computation in Linguistics (SCiL), 2018. [ PDF ]

Shaobin Xu and David A. Smith. Retrieving and combining repeated passages to improve OCR. In Proceedings of the ACM+IEEE-CS Joint Conference on Digital Libraries (JCDL), 2017.

Kriste Krstovski and David A. Smith. Online multilingual topic models with multi-level hyperpriors. In Proceedings of the Conference on Human Language Technology of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2016. [ PDF ]

Kriste Krstovski and David A. Smith. Bootstrapping translation detection and sentence extraction from comparable corpora. In Proceedings of the Conference on Human Language Technology of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2016. [ PDF ]

Kriste Krstovski, David A. Smith, and Michael Kurtz. Automatic construction of evaluation sets and evaluation of document similarity models in large scholarly retrieval systems. In AAAI Workshop on Scholarly Big Data, 2016.

Kriste Krstovski, David A. Smith, and Michael Kurtz. Evaluating retrieval models through histogram analysis. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.

David A. Smith, Ryan Cordell, and Abigail Mullen. Computational methods for uncovering reprinted texts in antebellum newspapers. American Literary History, 27(3), 2015. [ PDF ]

John Wilkerson, David A. Smith, and Nick Stramp. Tracing the flow of policy ideas on legislatures: A text reuse approach. American Journal of Political Science, 59(4):943–956, 2015. [ PDF ]

David A. Smith, Ryan Cordell, Elizabeth Maddock Dillon, Nick Stramp, and John Wilkerson. Detecting and modeling local text reuse. In Proceedings of the ACM+IEEE-CS Joint Conference on Digital Libraries (JCDL), 2014. Nominated for best paper. [ PDF ]

Youngho Kim, Jangwon Seo, W. Bruce Croft, and David A. Smith. Automatic suggestion of phrasal-concept queries for literature search. Information Processing & Management, 50(4):568–583, July 2014.

Shaobin Xu, David Smith, Abigail Mullen, and Ryan Cordell. Detecting and evaluating local text reuse in social networks. In ACL Joint Workshop on Social Dynamics and Personal Attributes in Social Media, 2014. [ PDF ]

Xiaoxi Xu, Tom Murray, Beverly Park Woolf, and David A. Smith. Social network signatures of effective online communication. In Intelligent Tutoring Systems, pages 621–622, 2014.

Xiaoxi Xu, Tom Murray, Beverly Park Woolf, and David A. Smith. Identifying social deliberative behavior from online communication—a cross-domain study. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS), pages 237–242, 2014.

David A. Smith, Ryan Cordell, and Elizabeth Maddock Dillon. Infectious texts: Modeling text reuse in nineteenth-century newspapers. In IEEE Workshop on Big Data and the Humanities, 2013. [ PDF ]

Kriste Krstovski, David A. Smith, Hanna M. Wallach, and Andrew McGregor. Efficient nearest-neighbor search in the probability simplex. In Proceedings of the International Conference on the Theory of Information Retrieval (ICTIR), 2013. [ PDF ]

Kriste Krstovski and David A. Smith. Online polylingual topic models for fast document translation detection. In Proceedings of the Workshop on Statistical Machine Translation, 2013.

Jacqueline L. Feild, Erik G. Learned-Miller, and David A. Smith. Using a probabilistic syllable model to improve scene text recognition. In International Conference on Document Analysis and Recognition (ICDAR), 2013.

Xiaoxi Xu, Tom Murray, Beverly Park Woolf, and David A. Smith. Mining social deliberation in online communication: If you were me and I were you. In International Conference on Educational Data Mining (EDM), 2013.

Jason Naradowsky, Tim Vieira, and David A. Smith. Grammarless parsing for joint inference. In Proceedings of the International Conference on Computational Linguistics (COLING), 2012. [ PDF ]

Jason Naradowsky, Sebastian Riedel, and David A. Smith. Improving NLP through marginalization of hidden syntactic structure. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2012.

Sebastian Riedel, David A. Smith, and Andrew McCallum. Parse, price and cut—delayed column and row generation for graph based parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2012.

Yanchuan Sim, Noah A. Smith, and David A. Smith. Discovering factions in the computational linguistics community. In ACL Workshop on Rediscovering 50 Years of Discoveries, 2012. [ PDF ]

Michael Bendersky and David A. Smith. A dictionary of wisdom and wit: Learning to extract quotable phrases. In NAACL Workshop on Computational Linguistics for Literature, pages 69–77, 2012. [ PDF ]

David Bamman and David A. Smith. Extracting two thousand years of Latin from a million book library. ACM Journal on Computing and Cultural Heritage, 5(1), 2012.

David A. Smith, R. Manmatha, and James Allan. Mining relational structure from millions of books: Position paper. In Proceedings of the CIKM BooksOnline Workshop, pages 49–54, 2011.

Jae-Hyun Park, W. Bruce Croft, and David A. Smith. A quasi-synchronous dependence model for information retrieval. In Conference on Information and Knowledge Management (CIKM), pages 17–26, 2011. [ PDF ]

Jinyoung Kim, W. Bruce Croft, David A. Smith, and Anton Bakalov. Evaluating an associative browsing model for personal information. In Conference on Information and Knowledge Management (CIKM), pages 647–652, 2011. [ PDF ]

Jeffrey Dalton, James Allan, and David A. Smith. Passage retrieval for incorporating global dependencies in sequence labeling. In Conference on Information and Knowledge Management (CIKM), pages 355–364, 2011. [ PDF ]

Kriste Krstovski and David A. Smith. A minimally supervised approach for detecting and ranking document translation pairs. In Proceedings of the Workshop on Statistical Machine Translation, pages 207–216, 2011. [ PDF ]

Jangwon Seo, W. Bruce Croft, and David A. Smith. Online community search using conversational structures. Information Retrieval, 14(6):547–571, 2011. [ PDF ]

Andrew Kae, David A. Smith, and Erik Learned-Miller. Learning on the fly: A font-free approach towards multilingual OCR. International Journal on Document Analysis and Recognition, 14(3):289–301, 2011. [ PDF ]

Michael Bendersky, W. Bruce Croft, and David A. Smith. Joint annotation of search queries. In Proceedings of the Association for Computational Linguistics, pages 102–111, 2011. [ PDF ]

John S. Y. Lee, Jason Naradowsky, and David A. Smith. A discriminative model for joint morphological disambiguation and dependency parsing. In Proceedings of the Association for Computational Linguistics, pages 885–894, 2011. [ PDF ]

Elif Aktolga, James Allan, and David A. Smith. Passage reranking for question answering using syntactic structures and answer types. In European Conference on Information Retrieval (ECIR), pages 617–628, 2011. [ PDF ]

Jinyoung Kim, Anton Bakalov, David A. Smith, and W. Bruce Croft. Building and evaluating a semantic representation for personal information. In Conference on Information and Knowledge Management (CIKM), pages 1741–1744, 2010.

Xiaobing Xue, W. Bruce Croft, and David A. Smith. Query reformulation using query distributions. In Conference on Information and Knowledge Management (CIKM), pages 1497–1500, 2010.

Michael Bendersky, W. Bruce Croft, and David A. Smith. Structural annotation of search queries using pseudo-relevance feedback. In Conference on Information and Knowledge Management (CIKM), pages 1537–1540, 2010. [ PDF ]

Sebastian Riedel, David A. Smith, and Andrew McCallum. Inference by minimizing size, divergence, or their sum. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), pages 227–234, 2010. [ PDF ]

Sebastian Riedel and David A. Smith. Relaxed marginal inference and its application to dependency parsing. In Proceedings of the Conference on Human Language Technology of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 760–768, 2010. [ PDF ]

Jangwon Seo, W. Bruce Croft, and David A. Smith. Online community search using thread structure. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM), pages 1907–1910, 2009.

David A. Smith and Jason Eisner. Parser adaptation and projection with quasi-synchronous grammar features. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 822–831, 2009. [ PDF | PowerPoint slides ]

David Mimno, Hanna Wallach, Jason Naradowsky, David A. Smith, and Andrew McCallum. Polylingual topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 880–889, 2009. [ PDF ]

Michael Bendersky, W. Bruce Croft, and David A. Smith. Two-stage query segmentation for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 810–811, 2009. [ PDF ]

David A. Smith and Jason Eisner. Dependency parsing by belief propagation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 145–156, 2008. [ PDF | PowerPoint slides ]

Keith Hall, Jiří Havelka, and David A. Smith. Log-linear models of non-projective trees, k-best MST parsing and tree-ranking. In Proceedings of the CoNLL Shared Task, pages 962–966, 2007.

David A. Smith and Noah A. Smith. Probabilistic models of nonprojective dependency trees. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 132–140, 2007. [ PDF | PowerPoint slides ]

David A. Smith and Jason Eisner. Bootstrapping feature-rich dependency parsers with entropic priors. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 667–677, 2007. [ PDF | PowerPoint slides ]

David A. Smith and Jason Eisner. Minimum risk annealing for training log-linear models. In Proceedings of the International Conference on Computational Linguistics and the Association for Computational Linguistics, pages 787–794, 2006. [ PDF ]

Markus Dreyer, David A. Smith, and Noah A. Smith. Vine parsing and minimum risk reranking for speed and precision. In Proceedings of the CoNLL Shared Task, pages 201–205, 2006. [ PDF ]

David A. Smith and Jason Eisner. Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies. In Proceedings of the HLT-NAACL Workshop on Statistical Machine Translation, pages 23–30, 2006. [ PDF | PowerPoint slides ]

Noah A. Smith, David A. Smith, and Roy W. Tromble. Context-based morphological disambiguation with random fields. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 475–482, 2005. [ PDF ]

David A. Smith and Noah A. Smith. Bilingual parsing with factored estimation: Using English to parse Korean. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 49–56, 2004. [ PDF ]

F.J. Och, D. Gildea, S. Khudanpur, A. Sarkar, K. Yamada, A. Fraser, S. Kumar, L. Shen, D. Smith, K. Eng, V. Jain, Z. Jin, and D. Radev. A smorgasbord of features for statistical machine translation. In Proceedings of the Conference on Human Language Technology and the North American Association for Computational Linguistics, pages 161–168, 2004. [ PDF ]

David A. Smith and Gideon S. Mann. Bootstrapping toponym classifiers. In Proceedings of the HLT-NAACL Workshop on Analysis of Geographic References, pages 45–49, 2003. [ PDF ]

David A. Smith, Anne Mahoney, and Gregory Crane. Integrating harvesting into digital library content. In Proceedings of the 2nd ACM+IEEE Joint Conference on Digital Libraries, pages 183–184, Portland, OR, July 2002. [ PDF ]

David A. Smith. Detecting events with date and place information in unstructured text. In Proceedings of the 2nd ACM+IEEE Joint Conference on Digital Libraries, pages 191–196, Portland, OR, July 2002. [ PDF ]

David A. Smith. Detecting and browsing events in unstructured text. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 73–80, Tampere, Finland, August 2002. [ PDF ]

David A. Smith and Gregory Crane. Disambiguating geographic names in a historical digital library. In Proceedings of the European Conference on Digital Libraries (ECDL), pages 127–136, Darmstadt, Germany, September 2001. [ PDF ]

David A. Smith, Anne Mahoney, and Jeffrey A. Rydberg-Cox. Management of XML documents in an integrated digital library. Markup Languages: Theory and Practice, 2(3):205–214, 2000. [ PDF ]

David A. Smith, Anne Mahoney, and Jeffrey A. Rydberg-Cox. Management of XML documents in an integrated digital library. In Proceedings of Extreme Markup Languages 2000, pages 219–224, Montreal, August 2000.

David A. Smith, Jeffrey A. Rydberg-Cox, and Gregory R. Crane. The Perseus Project: A digital library for the humanities. Literary and Linguistic Computing, 15(1):15–25, 2000.

David A. Smith. Textual variation and version control in the TEI. Computers and the Humanities, 33(1-2):103–112, 1999.

Gregory Crane, Clifford E. Wulfman, Lisa M. Cerrato, Anne Mahoney, Thomas L. Milbank, David Mimno, Jeffrey A. Rydberg-Cox, David A. Smith, and Christopher York. Towards a cultural heritage digital library. In Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2003, pages 75–86, Houston, TX, June 2003. [ PDF ]

Gregory R. Crane, Robert F. Chavez, Anne Mahoney, Thomas L. Milbank, Jeffrey A. Rydberg-Cox, David A. Smith, and Clifford E. Wulfman. Drudgery and deep thought: Designing a digital library for the humanities. Communications of the Association for Computing Machinery, 44(5):35–40, 2001. [ PDF ]

Gregory Crane, David A. Smith, and Clifford E. Wulfman. Building a hypertextual digital library in the humanities: A case study on London. In Proceedings of the First ACM+IEEE Joint Conference on Digital Libraries, pages 426–434, Roanoke, VA, June 2001. Best paper award. [ PDF ]

Other Publications

David A. Smith. Modeling errors in estimating historical trends. In Digital Humanities, 2024.

Shijia Liu and David A. Smith. Tracing accounts of racial terror in historical newspapers. In New Directions in Analyzing Text as Data (TADA), 2023.

Si Wu and David A. Smith. The language of US partisan newspapers from 1869 to 1925. In New Directions in Analyzing Text as Data (TADA), 2023.

Maud Ehrmann, Marten Düring, Clemens Neudecker, and Antoine Doucet. Computational Approaches to Digitised Historical Newspapers (Dagstuhl Seminar 22292). Dagstuhl Reports, 12(7):112–179, 2023. [ DOI | HTML ]

Giulia Taurino, Si Wu, and David A. Smith. Archeologies of data in contemporary journalism: The digital afterlives of newspapers' photo morgues. In Computation + Journalism, New York, June 2022.

Soumya Mohanty and David Smith. Alignment-based training for detecting reader annotations in printed books. In Proceedings of Digital Access to Textual Cultural Heritage (DATeCH), 2019.

David A. Smith and Ryan Cordell. A research agenda for historical and multilingual optical character recognition. Technical report, Northeastern University, 2018. https://repository.library.northeastern.edu/files/neu:f1881m409.

Kriste Krstovski, Michael J. Kurtz, David A. Smith, and Alberto Accomazzi. Multilingual topic models for indexing scientific articles. Under review, 2019. [ PDF ]

Ryan Muther and David A. Smith. Charting the changes: Modeling edits in the lawmaking process. In PoliInformatics, Bainbridge Island, WA, August 2017.

Ryan Cordell and David A. Smith. What news is new?: Ads, extras, and viral texts on the nineteenth-century newspaper page. In Digital Humanities, 2017.

Ryan Cordell, David Smith, and Shaobin Xu. Aggregating exchange in the nineteenth-century newspaper. In Society for the History of Authorship, Reading, and Publishing (SHARP), Victoria, BC, June 2017.

David A. Smith, Anne Washington, and John Wilkerson. Attacking the code: A computational approach to discovering issue networks in congress. In Political Networks, Portland, OR, June 2015.

Chris Biemann, Gregory R. Crane, Christiane D. Fellbaum, and Alexander Mehler, editors. Computational Humanities – Bridging the Gap between Computer Science and Digital Humanities (Dagstuhl Seminar 14301), volume 4 of Dagstuhl Reports, Dagstuhl, Germany, 2014. Schloss Dagstuhl–Leibniz-Zentrum für Informatik. [ DOI | HTML ]

John Wilkerson, David A. Smith, and Nick Stramp. The inclusiveness of lawmaking: A text reuse approach to tracing the progress of policy ideas in legislation. In MPSA Annual Meeting. Midwest Political Science Association, April 2014.

John Wilkerson, David A. Smith, and Nick Stramp. Tracing the flow of policy ideas in legislatures: A text reuse approach. In New Directions in Analyzing Text as Data. London School of Economics, September 2013. [ PDF ]

John Wilkerson, David A. Smith, Nick Stramp, and James Dashiell. Tracing the flow of policy ideas in legislatures: A computational approach. In APSA Annual Meeting. American Political Science Association, September 2013.

Ryan Cordell, Elizabeth Maddock Dillon, and David A. Smith. Uncovering reprinting networks in nineteenth-century American newspapers. In Digital Humanities, 2013.

Ryan Cordell and David A. Smith. Uncovering reprinting networks in nineteenth-century American newspapers. In Chicago Colloquium on Digital Humanities & Computer Science, November 2012.

Xiaoye Wu and David A. Smith. Right-branching tree transformation for eager dependency parsing. Technical Report CIIR-776, University of Massachusetts, 2010. [ PDF ]

Jason Naradowsky, Joe Pater, David Smith, and Robert Staubs. Learning hidden metrical structure with a log-linear model of grammar. In Computational Modelling of Sound Pattern Acquisition, pages 59–60, Edmonton, February 2010. Department of Linguistics, University of Alberta.

Joe Pater, David A. Smith, Robert Staubs, Karen Jesney, and Ramgopal Mettu. Learning hidden structure with a log-linear model of grammar. In Linguistic Society of America (LSA), Baltimore, January 2010.

Gregory Druck and David A. Smith. Computing conditional feature covariance under non-projective tree conditional random fields. Technical Report UM-CS-2009-060, University of Massachusetts, 2009.

David A. Smith. Debabelizing libraries: Machine translation by and for digital collections. D-Lib Magazine, 12(3), March 2006. [ HTML ]

Anne Mahoney, Jeffrey A. Rydberg-Cox, David A. Smith, and Clifford E. Wulfman. Generalizing the Perseus XML document manager. In Linguistic Exploration: Workshop on Web-based Language Documentation and Description, Philadelphia, December 2000. [ HTML ]