Scaling Approximate Inference and Approximation – Aware Learning
Tue 10.17.17
Scaling Approximate Inference and Approximation – Aware Learning
Tue 10.17.17
Tue 10.17.17
Tue 10.17.17
Tue 10.17.17
Tue 10.17.17
The last decade has seen an enormous increase in our ability to gather and manage large amounts of data; business, healthcare, education, economy, science, and almost every aspect of society are accumulating data at unprecedented levels. The basic premise is that by having more data, even if uncertain and of lower quality, we are also able to make better-informed decisions. To make any decisions, we need to perform “inference” over the data, i.e. to either draw new conclusions, or to find support for existing hypotheses, thus allowing us to favor one course of action over another. However, general reasoning under uncertainty is highly intractable, and many state-of-the-art systems today perform approximate inference by reverting to sampling. Thus for many modern applications (such as information extraction, knowledge aggregation, question-answering systems, computer vision, and machine intelligence), inference is a key bottleneck, and new methods for tractable approximate inference are needed.
This project addresses the challenge of scaling inference by generalizing two highly scalable approximate inference methods and complementing them with scalable methods for parameter learning that are “approximation-aware.” Thus, instead of treating the (i) learning and the (ii) inference steps separately, this project uses the approximation methods developed for inference also for learning the model. The research hypothesis is that this approach increases the overall end-to-end prediction accuracy while simultaneously increasing scalability. Concretely, the project develops the theory and a set of scalable algorithms and optimization methods for at least the following four sub-problems: (1) approximating general probabilistic conjunctive queries with standard relational databases; (2) learning the probabilities in uncertain databases based on feedback on rankings of output tuples from general queries; (3) approximating the exact probabilistic inference in undirected graphical models with linearized update equations; and (4) complementing the latter with a robust framework for learning linearized potentials from partially labeled data.
The last decade has seen an enormous increase in our ability to gather and manage large amounts of data; business, healthcare, education, economy, science, and almost every aspect of society are accumulating data at unprecedented levels. The basic premise is that by having more data, even if uncertain and of lower quality, we are also able to make better-informed decisions. To make any decisions, we need to perform “inference” over the data, i.e. to either draw new conclusions, or to find support for existing hypotheses, thus allowing us to favor one course of action over another. However, general reasoning under uncertainty is highly intractable, and many state-of-the-art systems today perform approximate inference by reverting to sampling. Thus for many modern applications (such as information extraction, knowledge aggregation, question-answering systems, computer vision, and machine intelligence), inference is a key bottleneck, and new methods for tractable approximate inference are needed.
This project addresses the challenge of scaling inference by generalizing two highly scalable approximate inference methods and complementing them with scalable methods for parameter learning that are “approximation-aware.” Thus, instead of treating the (i) learning and the (ii) inference steps separately, this project uses the approximation methods developed for inference also for learning the model. The research hypothesis is that this approach increases the overall end-to-end prediction accuracy while simultaneously increasing scalability. Concretely, the project develops the theory and a set of scalable algorithms and optimization methods for at least the following four sub-problems: (1) approximating general probabilistic conjunctive queries with standard relational databases; (2) learning the probabilities in uncertain databases based on feedback on rankings of output tuples from general queries; (3) approximating the exact probabilistic inference in undirected graphical models with linearized update equations; and (4) complementing the latter with a robust framework for learning linearized potentials from partially labeled data.