30 Type Inference
fun f(x, y): if x: y + 1 else: y - 1 end end
fun f(x :: Boolean, y :: Number): ...
Newer languages like Scala and Typed Racket have this in more limited measure: a feature called local type inference. Here, however, we will study the more traditional and powerful form.
30.1 Type Inference as Type Annotation Insertion
First, let’s understand what type inference is doing. Some people mistakenly think of languages with inference as having no type declarations, with inference taking their place. This is confused at multiple levels. For one thing, even in languages with inference, programmers are free (and for documentation purposes, are often encouraged) to annotate types. Furthermore, in the absence of such declarations, it is not quite clear what inference actually means.Sometimes, inference is also undecidable and programmers have no choice but to declare some of the types. Finally, writing explicit annotations can greatly reduce indecipherable error messages.
fun f(x :: ___, y :: ___): ...
30.2 Understanding Inference
For worked examples and more details, see Chapter 30 of Programming Languages: Application and Interpretation.
Suppose we have an expression (or program) e written in an explicitly typed language: i.e., e has type annotations everywhere they are required. Now suppose we erase all annotations in e, and use a procedure infer to deduce them back.
Do Now!
What property do we expect of infer?
We could demand many things of it. One might be that it produces precisely those annotations that e originally had. This is problematic for many reasons, not least that e might not even type-check, in which case how could infer possibly know what they were (and hence should be)? This might strike you as a pedantic trifle: after all, if e didn’t type-check, how can erasing its annotations and filling them back in make it do so? Since neither program type-checks, who cares?
Do Now!
Is this reasoning correct?
lam(x :: Number) -> String: x end
lam(x): x end
It does not say what must happen if e fails to type-check, i.e., it does not preclude a type inference algorithm that makes the faultily-typed identity function above typeable.
More importantly, it assures us that we lose nothing by employing type inference: no program that was previously typeable will now cease to be so. That means we can focus on using explicit annotations where we want to, but will not be forced to do so.Of course, this only holds if inference is decidable.
lam(x :: Number) -> Number: x end
With these preliminaries out of the way, we are now ready to delve into the mechanics of type inference. The most important thing to note is that our simple, recursive-descent type-checking algorithm [A Type Checker for Expressions and Functions] will no longer work. That was possible because we already had annotations on all function boundaries, so we could descend into function bodies carrying information about those annotations in the type environment. Sans these annotations, it is not clear how to descend. In fact, it is not clear that there is any particular direction that makes more sense than another.
All this information is in the function. But how do we extract it systematically and in an algorithm that terminates and enjoys the property we have stated above? We do this in two steps. First we generate constraints, based on program terms, on what the types must be. Then we solve constraints to identify inconsistencies and join together constraints spread across the function body. Each step is relatively simple, but the combination creates magic.
30.2.1 Constraint Generation
Our goal, ultimately, is to find a type to fill into every type annotation position. It will prove to be just as well to find a type for every expression. A moment’s thought will show that this is likely necessary anyway: for instance, how can we determine the type to put on a function without knowing the type of its body? It is also sufficient, in that if every expression has had its type calculated, this will include the ones that need annotations.
First, we must generate constraints to (later) solve. Constraint
generation walks the program source, emitting appropriate constraints
on each expression, and returns this set of constraints. It works by
recursive descent mainly for simplicity; it really computes a
set of constraints, so the order of traversal and generation
really does not matter in principle—
That it is related to the type of some identifier.
That it is related to the type of some other expression.
That it is a base type, such as numbers and Booleans.
That it is a constructed type such as a function, whose domain and range types are presumably further constrained.
data TyCon: tyeq(l :: TyCHS, r :: TyCHS) end data TyCHS: | t-expr(e :: TyExprC) | t-con(name :: String, fields :: List<TyCHS>) end
numeric-t-con = t-con("num", empty) boolean-t-con = t-con("bool", empty) fun mk-fun-t-con(a, r): t-con("fun", [list: a, r]) end
fun generate(e :: TyExprC) -> List<TyCon>: |
cases (TyExprC) e: |
end |
end |
| numC(_) => |
[list: tyeq(t-expr(e), numeric-t-con)] |
| idC(s) => |
empty |
| plusC(l, r) => generate-arith-binop(e, l, r) |
| multC(l, r) => generate-arith-binop(e, l, r) |
fun generate-arith-binop(e :: TyExprC, l :: TyExprC, r :: TyExprC) -> List<TyCon>: [list: tyeq(t-expr(e), numeric-t-con), tyeq(t-expr(l), numeric-t-con), tyeq(t-expr(r), numeric-t-con)] + generate(l) + generate(r) end
| trueC => |
[list: tyeq(t-expr(e), boolean-t-con)] |
| falseC => |
[list: tyeq(t-expr(e), boolean-t-con)] |
| ifC(cnd, thn, els) => |
[list: tyeq(t-expr(cnd), boolean-t-con), |
tyeq(t-expr(thn), t-expr(els)), |
tyeq(t-expr(thn), t-expr(e))] + |
generate(cnd) + generate(thn) + generate(els) |
Exercise
What happens if you leave out thetyeq(t-expr(thn), t-expr(e))?
Exercise
What if we instead wrotetyeq(t-expr(els), t-expr(e))? Would it make a difference?
Now we get to the other two interesting cases, function declaration and application. In both cases, we must remember to generate and return constraints of the sub-expressions.
| fdC(a, b) => |
[list: |
tyeq(t-expr(e), |
mk-fun-t-con(t-expr(idC(a)), t-expr(b)))] + |
generate(b) |
Do Now!
Do you see why we have idC(a) instead of just a?
| appC(f, a) => |
[list: |
tyeq(t-expr(f), |
mk-fun-t-con(t-expr(a), t-expr(e)))] + |
generate(f) + |
generate(a) |
And that’s it! We have finished generating constraints; now we just have to solve them.
30.2.2 Constraint Solving Using Unification
The process used to solve constraints is known as unification. A unifier is given a set of equations. Each equation maps a variable to a term, whose datatype is above.
For our purposes, the goal of unification is to generate a substitution, or mapping from variables to terms that do not contain any variables. This should sound familiar: we have a set of simultaneous equations in which each variable is used linearly; such equations are solved using Gaussian elimination. In that context, we know that we can end up with systems that are both under- and over-constrained. The same thing can happen here, as we will soon see.
The unification algorithm works iteratively over the set of constraints. Because each constraint equation has two terms and each term can be one of two kinds, there are four cases to cover.
The algorithm begins with the set of all constraints, and the empty substitution. Each constraint is considered once and removed from the set, so in principle the termination argument should be utterly simple, but it will prove to be slightly more tricky. As constraints are disposed, the substitution set tends to grow. When all constraints have been disposed, unification returns the final substitution set.
For a given constraint, the unifier examines the left-hand-side of the equation. If it is a variable, it is now ripe for elimination. The unifier adds the variable’s right-hand-side to the substitution and, to truly eliminate it, replaces all occurrences of the variable in the substitution with the this right-hand-side.It is worth noting that because the constraints are equalities, eliminating a variable is tantamount to associating it with the same set as whatever replaces it. In other words, we can use union-find [Disjoint Sets Redux] to implement this process efficiently, though if we need to backtrack during unification (as we do for logic programming [REF]), this becomes much more tricky.
Do Now!
Did you notice the subtle error above?
The subtle error is this. We said that the unifier eliminates the variable by replacing all instances of it in the substitution. However, that assumes that the right-hand-side does not contain any instances of the same variable. Otherwise we have a circular definition, and it becomes impossible to perform this particular substitution. For this reason, unifiers include a occurs check: a check for whether the same variable occurs on both sides and, if it does, decline to unify. For simplicity we will ignore this here.
Do Now!
Construct a term whose constraints would trigger the occurs check.
Do you remember ω [Recursion and Non-Termination]?
Let us now implement unification. For simplicity, we will use a list of type constraints as the representation of the subtitution.As you read this, keep in mind that unification is a generic procedure, completely independent of type-inference: indeed, the unification algorithm was invented before and spurred the creation of the type-inference process.
Exercise
If we use type constraints to represent the substitution, what invariant would we expect the computed set of constraints to have?
It will be convenient to have a helper function that takes the current substitution as an accumulated parameter. Let’s therefore include it, and get the easy case out of the way:
fun unify(cs :: List<TyCon>) -> List<TyCon>: |
fun help(shadow cs :: List<TyCon>, sub :: List<TyCon>) -> List<TyCon>: |
cases (List) cs: |
| empty => sub |
| link(f, r) => |
end |
end |
help(cs, empty) |
end |
- If both sides are t-expr’s, then we simply replace one with the other (this is the “variable elimination” case of the Gaussian procedure). We must perform this replacement everywhere: in the remaining terms but also in the substitution already performed.
Exercise
What happens if we miss doing this replacement in one or the other?
If one side is a t-expr and the other a t-con, then we have resolved that expression’s type to a concrete type. Record this and substitute.
There are two cases of a t-expr and t-con: for simplicity, we handle one case and in the other case, rewrite the problem to the former case and recur.This swapping of sides is legal because these are equational constraints.
If we have to unify two constructors, then they had better be the same constructor! If they are not, we have a type error. If they are, then we recur on their parameters.
lhs = f.l |
rhs = f.r |
ask: |
| is-t-expr(lhs) and is-t-expr(rhs) then: |
help(subst(lhs, rhs, r), link(f, subst(lhs, rhs, sub))) |
| is-t-expr(lhs) and is-t-con(rhs) then: |
help(subst(lhs, rhs, r), link(f, subst(lhs, rhs, sub))) |
| is-t-con(lhs) and is-t-expr(rhs) then: |
help(link(tyeq(rhs, lhs), r), sub) |
| is-t-con(lhs) and is-t-con(rhs) then: |
if lhs.name == rhs.name: |
help(map2(tyeq, lhs.fields, rhs.fields) + r, sub) |
else: |
raise('type error: ' + lhs.name + ' vs. ' + rhs.name) |
end |
end |
In terms of proving termination, note that the last two cases do not shrink the input: the third keeps it the same, while the fourth in some cases grows it.
fun subst(to-rep :: TyCHS%(is-t-expr), rep-with :: TyCHS, in :: List<TyCon>) -> List<TyCon>: cases (List) in: | empty => empty | link(f, r) => lhs = f.l rhs = f.r link( tyeq( if lhs == to-rep: rep-with else: lhs end, if rhs == to-rep: rep-with else: rhs end), subst(to-rep, rep-with, r)) end end
Exercise
There is a subtle bug in the above implementation of unification. It assumes that two textually identical expressions must have the same type. Construct a counter-example to show that this is not true. Then fix the implementation (consider using reference rather than structural equality [Re-Examining Equality]).
Exercise
The algorithm above is rather naive. Given a choice, we would rather see the types of identifiers rather than those of expressions. Modify the algorithm to bias in this direction.
Exercise
The output of the above algorithm is unsatisfying: a set of (solved) constraints rather than an “answer”. Extract the type of the top-level expression, and “pretty-print” it in terms of only type constants, referring to expressions only when necessary (Over- and Under-Constrained Solutions).
Exercise
Prove the termination of this algorithm. Make an argument based on the size of the constraint set and on the size of the substitution.
Exercise
Augment this implementation with the occurs check.
Exercise
Use union-find to optimize this implementation. Measure the performance gain.
With this, we are done. Unification produces a substitution. We can now traverse the substitution and find the types of all the expressions in the program, then insert the type annotations accordingly.
30.3 Type Checking and Type Errors
A theorem, which we will not prove here, dictates that the success of the above process implies that the program would have typed-checked, so we need not explicitly run the type-checker over this program.
Observe, however, that the nature of a type error has now changed
dramatically. Previously, we had a recursive-descent algorithm that
walked a expressions using a type environment. The bindings in the
type environment were programmer-declared types, and could hence be
taken as (intended) authoritative specifications of types. As a
result, any mismatch was blamed on the expressions, and reporting type
errors was simple (and easy to understand). Here, however, a type
error is a failure to notify. The unification failure is based
on events that occur at the confluence of two smart
algorithms—
30.4 Over- and Under-Constrained Solutions
Remember that the constraints may not precisely dictate the type of all variables. If the system of equations is over-constrained, then we get clashes, resulting in type errors. If instead the system is under-constrained, that means we don’t have enough information to make definitive statements about all expressions. For instance, in the expression (fun (x) x) we do not have enough constraints to indicate what the type of x, and hence of the entire expression, must be. This is not an error; it simply means that x is free to be any type at all. In other words, its type is “the type of x -> the type of x” with no other constraints. The types of these underconstrained identifiers are presented as type variables, so the above expression’s type might be reported as (A -> A).
The unification algorithm actually has a wonderful property: it automatically computes the most general types for an expression, also known as principal types. That is, any actual type the expression can have can be obtained by instantiating the inferred type variables with actual types. This is a remarkable result: in another example of computers beating humans, it says that no human can generate a more general type than the above algorithm can!
30.5 Let-Polymorphism
(let ([id (fun (x) x)]) |
(if (id true) |
(id 5) |
(id 6))) |
(if (id<Boolean> true) |
(id<Number> 5) |
(id<Number> 6)) |
The reason for this is because the types we have inferred through unification are not actually polymorphic. This is important to remember: just because you type variables, you don’t necessarily have polymorphism! The type variables could be unified at the next use, at which point you end up with a mere monomorphic function. Rather, true polymorphism only obtains when you can instantiate type variables.
In languages with true polymorphism, then, constraint generation and unification are not enough. Instead, languages like ML and Haskell implement something colloquially called let-polymorphism. In this strategy, when a term with type variables is bound in a lexical context, the type is automatically promoted to be a quantified one. At each use, the term is effectively automatically instantiated.
There are many implementation strategies that will accomplish this. The most naive (and unsatisfying) is to merely copy the code of the bound identifier; thus, each use of id above gets its own copy of (fun (x) x), so each gets its own type variables. The first might get the type (A -> A), the second (B -> B), the third (C -> C), and so on. None of these type variables clash, so we get the effect of polymorphism. Obviously, this not only increases program size, it also does not work in the presence of recursion. However, it gives us insight into a better solution: instead of copying the code, why not just copy the type? Thus at each use, we create a renamed copy of the inferred type: id’s (A -> A) becomes (B -> B) at the first use, and so on, thus achieving the same effect as copying code but without its burdens. Because all these strategies effectively mimic copying code, however, they only work within a lexical context.