Thomas Wahl
Khoury College of Computer Sciences
Northeastern University
360 Huntington Avenue
Boston, Massachusetts 02115
Office: 202 West Village H, room 344
Phone: ++1-617-373-3100
Email: t.wahl–ät–northeastern.edu
|
|
|
Welcome to my former homepage. After
working at Northeastern for about a decade, I
have moved on to join
GrammaTech,
Inc.
The information below may therefore be somewhat
out of date. Naturally, I will also not take on
new students at this time.
I do research on various aspects of Formal
Verification and Program Analysis, such as for
concurrent and numeric programs; further on
detecting and repairing security vulnerabilities
of programs, such as information leaks introduced
by compilers, and structural and quantitative
manipulations on neural networks to influence
their decision making.
Brief bio:
- 2011–2021: Full-time faculty at Northeastern University, Khoury College of Computer Sciences
- 2009–2011: Research Officer, Oxford University, Department of Computer Science (formerly "Computing Laboratory"), UK
- 2007–2009: Lecturer and Postdoc, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland
- 2007: Ph.D., University of Texas, Austin
General research interests:
- Software verification techniques, such as predicate abstraction, SAT and SMT solving, decision procedures
- Infinite-state system verification, such as arbitrarily-threaded programs or distributed systems
- Stability, robustness, reliability of numeric computations, especially using floating-point arithmetic
- Software analysis techniques for security verification, such as detection of information leaks (e.g. through side channels)
Past and current projects:
Resource-aware program verification using Observation Sequences:
Short version: Verification techniques try to analyze
software (or hardware) for bugs: behaviors that are unexpected,
unwanted, and can cause the software to end up in an unworkable state,
crash, or simply compute wrong results. This project develops a
verification technique that tries to exploit the fact that most software
systems are designed over a variable number of resources. How do
changes (especially incremental changes) of the resource parameter affect
the behavior of the software?
More details: Many systems are naturally designed (consciously or
not) over a variable number of certain resources, which can mean
physical or logical entities. Examples of such resources include:
- the number of physical components of a certain functional type,
- the number of clients that simultaneously request a service from a server,
- the capacity of the communication channel between components,
- the number of context switches permitted among concurrent threads,
- the recursion nesting depth (= stack size) permitted in a piece of software.
We can expect that the behavior of the instances of the parametric design
family (= the concrete designs for each parameter choice) depend on the
parameter in a smooth, "continuous" way. This enables analyses of the
function that maps the number of resource instances (= the parameter) to a
concrete design (this function goes from the natural numbers to the design
space).
More concretely, suppose k is an integral parameter of the
design. With the target property P to be verified in mind, we define
a function O mapping the parameter to observations about the
design of size k, such that knowledge of O(k) enables a
decision on whether P holds of the instance of size k. For
example, the "observation" could simply be the set of states reachable in
the system of size k.
In general, we require of function O that it be monotone:
more resources means more system behaviors (observations). This is almost
always the case provided the value of k is not accessible in the
program defining a system instance's behavior. (If it is, the dependence
on k can be arbitrary, making analysis against unbounded k in
general impossible.)
We now analyze the system design for increasing values of k. Due to
monotonicity of O, consecutive observation sets O(k) are in
some sort of ≤ relationship. This begs the question whether the
function O
converges at some parameter
point k0, i.e. whether it stops increasing
after reaching k0. If so, the truth value
of property P has stabilized, i.e. unless we have detected a
violation of P up to this point, we have proved the property for all
parameter instances.
We have applied the Observation Sequence paradigm to the undecidable
problem of analyzing reachability for concurrent procedures communicating
via global-scope
variables [PLDI
2018]. Our technique is dubbed CUBA
(Context-UnBounded Analysis). A webpage with
downloads and many other resources can be
found here.
We have also applied this paradigm to unbounded-queue message passing
systems written in
the P
language, such as distributed communication
protocols [CAV
2019]. The resource page for this project
is here.
I am also fortunate to work with a super team of Northeastern
undergraduates, Andrew Dai and Andrew Johnson, who are
helping me expand the scope of this technique.
Platform dependencies of floating-point programs:
Short version: The precise semantics of floating-point
arithmetic programs, i.e. the values they compute, depends not only on the
input but also on the behavior of the compiler and on the target
hardware. Such dependencies infringe on the highly desirable goal of
software portability: the same program run on the same inputs on different
platforms can produce different results. We are investigating the
consequences of these dependencies for numeric programs in general, and for
the trustworthiness of floating-point compilers.
More details: We are working on a way to evaluate the
"damage", in terms of precision, that compiler optimizations cause to
floating-point programs. Since floating-point arithmetic expression
semantics is very sensitive to the evaluation order (e.g., the addition
operation is not associative), expression rewriting generally changes the
value computed, at least for some inputs. This causes runtime optimizers to
produce code that is not I/O equivalent to the original, violating one of
the fundamental principles of code optimization.
In many cases, the effect on the computed value is marginal, but not in
all. How can we detect such cases, how can we decide whether the
optimization is "worth it" compared to the change in output value, and how
can we perhaps even repair the optimizer, so that it produces accurate yet
runtime-optimized code? [This is ongoing work.]
Earlier we developed a method that can partially repair numeric instability
against platform changes, making numeric programs more robust. This is in
principle easy, by instructing the compiler to evaluate every expression
following a "strict" evaluation model, such as "left-to-right", and to
ignore any platform-specific precision- or efficiency-enhancing features,
such as contracted operations, custom floating-point division routines,
etc., across the whole program. But numeric instability is often caused
only by a few critical program fragments; we don't want to take away the
compiler's freedom to optimize the code that does not contribute (much) to
instability. Our method first identifies what these critical fragments are,
and then stabilizes them, within the smallest-possible scope, leaving the
compiler free to manipulate other portions of the
code. [VMCAI
2017]
Yet earlier we looked at platform-dependent program differences that lead
to changes in the control flow of a program. Such differences
seriously affect the portability of numeric code. We developed a method
that takes a numeric procedure and determines whether a given input may
cause the procedure to take different branches, based merely on the
floating-point platform on which it is compiled and executed. This method
can be used in static and dynamic analyses of programs, to estimate their
numeric
stability [EURO-PAR
2015]. A project webpage can be
found here.
Compilation-dependent security properties of software (currently with Konstantinos Athanasiou):
Software analysis techniques are arguably underused in securing a program
against vulnerabilities such as data leaks. The techniques that do exist
typically operate at the source code level, which has proven somewhat
effective at detecting leaks. For software side-channel attacks,
which try to gain access to passwords or other secret data by exploiting
fine-grained information about the power or time consumption of program
runs, source code level techniques are not reliable in
actually verifying security properties. The reason is that the
program undergoes many transformations during compilation, such as due to
code optimizations, register allocation, and even runtime instruction
scheduling, all of which have security ramifications. Such transformations
are of course not visible, often not even predictable, at the source code
level. Classical static analysis techniques thus give a false sense of
security.
This problem is well-known throughout verification but has recently
received a huge amount of attention in the wake of processor-level
side-channel leaks such as Meltdown
and Spectre. In this project we investigate various instances of this
problem. Our goals are to detect them, to fix them, and—if
possible—to prove that certain leaks are not possible, independent of
how the compiler turns the program into an executable. [This is ongoing
work.]
I am grateful for the support of Maggie Van Nortwick and Owen
Loveluck, two strong undergraduates at Northeastern, in this project.
Protecting Confidentiality and Integrity of Deep Neural Networks against Side-Channel and Fault Attacks
Short version: Deep Learning has become a foundational means
for solving grand societal challenges. Security of the deep neural
networks' (DNN) inference engines, and of DNN training data, has become a
big challenge in deploying artificial intelligence. This project aims at
investigating both confidentiality (secrecy) breaches and integrity
breaches (malicious manipulations) of DNNs themselves, or their inference
processes.
Sample Question 1: [confidentiality violations] how can we
reverse-engineer structural and quantitative parameters of Neural Networks
whose internals are intended to be confidential? One option is to observe
the behavior of the DNN on specialized inputs. These may be inputs that
cause the DNN inference to take unusually long (a timing side
channel), or cause the computation to raise certain observable
exceptions.
Sample Question 2: [integrity violations] how can we
manipulate a DNN running on a local device (e.g. by physically altering the
device's circuitry [fault attacks]) in a way that adversely affects
its execution? We plan to use rigorous tools such as mathematical formula
solvers to determine how the logic of the DNN needs to be altered to
achieve a certain behavioral change. Such computations help us assess the
vulnerability of the DNN against fault attacks.
Research sponsors:
|
|