15 Functions as Data
It’s interesting to consider how expressive the little programming
we’ve learned so far can be. To illustrate this, we’ll work through a
few exercises of interesting concepts we can express using just
functions as values. We’ll write two quite different things, then show
how they converge nicely.
15.1 A Little Calculus
If you’ve studied the differential calculus, you’ve come across
curious sytactic statements such as this:
\begin{equation*}{\frac{d}{dx}} x^2 = 2x\end{equation*}
Let’s unpack what this means: the \(d/dx\), the \(x^2\), and the \(2x\).
First, let’s take on the two expressions; we’ll discuss one, and the
discussion will cover the other as well. The correct response to
“what does \(x^2\) mean?” is, of course, an error: it doesn’t mean
anything, because \(x\) is an unbound identifier.
So what is it intended to mean? The intent, clearly, is to
represent the function that squares its input, just as \(2x\) is meant
to be the function that doubles its input. We have nicer ways of
writing those:
fun square(x :: Number) -> Number: x * x end
fun double(x :: Number) -> Number: 2 * x end
and what we’re really trying to say is that the \(d/dx\) (whatever
that is) of square is double.We’re
assuming functions of arity one in the variable that is changing.
So now let’s unpack \(d/dx\), starting with its type. As the above
example illustrates, \(d/dx\) is really a function from
functions to functions. That is, we can write its type as follows:
d-dx :: ((Number -> Number) -> (Number -> Number))
(This type might explain why your calculus course never explained this
operation this way—though it’s not clear that obscuring its true
meaning is any better for your understanding.)
Let us now implement d-dx. We’ll implement numerical
differentiation, though in principle we could also implement
symbolic differentiation—using rules you learned, e.g.,
given a polynomial, multiply by the exponent and reduce the exponent
by one—with a representation of expressions
[Representing Arithmetic].
In general, numeric differentiation of a function at a point yields
the value of the derivative at that point. We have a handy formula for
it: the derivative of \(f\) at \(x\) is
\begin{equation*}\frac{f(x + \epsilon) - f(x)}{\epsilon}\end{equation*}
epsilon = 0.001
Let’s now try to translate the above formula into Pyret:
fun d-dx(f :: (Number -> Number)) -> (Number -> Number):
(f(x + epsilon) - f(x)) / epsilon
end
What’s the problem with the above definition?
If you didn’t notice, Pyret will soon tell you:
x isn’t
bound. Indeed, what is
x? It’s the point at which we’re trying
to compute the numeric derivative. That is,
d-dx needs to
return not a number but a
function (as the type indicates) that
will consume this
x:
“Lambdas are relegated to
relative obscurity until Java makes them popular by not having
them.”—James Iry,
A
Brief, Incomplete, and Mostly Wrong History of Programming Languages
fun d-dx(f :: (Number -> Number)) -> (Number -> Number):
lam(x :: Number) -> Number:
(f(x + epsilon) - f(x)) / epsilon
end
end
Sure enough, this definition now works. We can, for instance, test it
as follows (note the use of num-floor to avoid numeric precision
issues from making our tests appear to fail):
d-dx-square = d-dx(square)
check:
ins = [list: 0, 1, 10, 100]
for map(n from ins):
num-floor(d-dx-square(n))
end
is
for map(n from ins):
num-floor(double(n))
end
end
Now we can return to the original example that launched this
investigation: what the sloppy and mysterious notation of math is
really trying to say is,
d-dx(lam(x): x * x end) = lam(x): 2 * x end
\begin{equation*}{\frac{d}{dx}} [x \rightarrow x^2] = [x \rightarrow 2x]\end{equation*}
Pity math textbooks for not wanting to tell us the truth!
15.2 A Helpful Shorthand for Anonymous Functions
Pyret offers a shorter syntax for writing anonymous functions. Though,
stylistically, we generally avoid it so that our programs don’t become
a jumble of special characters, sometimes it’s particularly
convenient, as we will see below. This syntax is
where a is zero or more arguments and b is the body. For
instance, we can write lam(x): x * x end as
where we can see the benefit of brevity. In particular, note that
there is no need for end, because the braces take the place of
showing where the expression begins and ends. Similarly, we could have
written d-dx as
fun d-dx-short(f):
{(x): (f(x + epsilon) - f(x)) / epsilon}
end
but many readers would say this makes the function harder to read,
because the prominent lam makes clear that d-dx returns
an (anonymous) function, whereas this syntax obscures it. Therefore,
we will usually only use this shorthand syntax for “one-liners”.
15.3 Streams From Functions
People typically think of a function as serving one purpose: to
parameterize an expression. While that is both true and the most
common use of a function, it does not justify having a function of no
arguments, because that clearly parameterizes over nothing at all. Yet
functions of no argument also have a use, because functions actually
serve two purposes: to parameterize, and to suspend evaluation
of the body until the function is applied. In fact, these two uses
are orthogonal, in that one can employ one feature without the
other. In Sugaring Over Anonymity we see one direction of this:
parameterized functions that are used immediately, so that we employ
only abstraction and not delay. Below, we will see the other: delay
without abstraction.
Let’s consider the humble list. A list can be only finitely
long. However, there are many lists (or sequences) in nature
that have no natural upper bound: from mathematical objects (the
sequence of natural numbers) to natural ones (the sequence of hits to
a Web site). Rather than try to squeeze these unbounded lists into
bounded ones, let’s look at how we might represent and program over
these unbounded lists.
First, let’s write a program to compute the sequence of natural
numbers:
fun nats-from(n):
link(n, nats-from(n + 1))
end
Does this program have a problem?
While this represents our intent, it doesn’t work: running it—e.g.,
nats-from(0)—creates an infinite loop evaluating
nats-from for every subsequent natural number. In other words,
we want to write something very like the above, but that doesn’t recur
until we want it to, i.e., on demand. In other words, we want
the rest of the list to be lazy.
This is where our insight into functions comes in. A function, as we
have just noted, delays evaluation of its body until it is
applied. Therefore, a function would, in principle, defer the
invocation of nats-from(n + 1) until it’s needed.
Except, this creates a type problem: the second argument to
link needs to be a list, and cannot be a function. Indeed,
because it must be a list, and every value that has been constructed
must be finite, every list is finite and eventually terminates in
empty. Therefore, we need a new data structure to represent the
links in these lazy lists (also known as streams):
data Stream<T>: |
| lz-link(h :: T, t :: ( -> Stream<T>)) |
end |
where the annotation ( -> Stream<T>) means a function from no
arguments (hence the lack of anything before ->),
also known as a thunk. Note that the way we have
defined streams they must be infinite, since we have provided
no way to terminate them.
Let’s construct the simplest example we can, a stream of constant
values:
ones = lz-link(1, lam(): ones end)
Pyret will actually complain about this definition. Note that
the list equivalent of this also will not work:
because ones is not defined at the point of
definition, so when Pyret evaluates link(1, ones), it complains
that ones is not defined. However, it is being overly
conservative with our former definition: the use of ones is
“under a lam”, and hence won’t be needed until after the
definition of ones is done, at which point ones
will be defined. We can indicate this to Pyret by using the
keyword rec:
rec ones = lz-link(1, lam(): ones end)
To understand more about recursive definitions, see
Recursive Functions. Note that in Pyret, every
fun
implicitly has a
rec beneath it, which is why we can
create recursive functions with aplomb.
Earlier we said that we can’t write
What if we tried to write
instead? Does this work and, if so, what value is ones bound
to? If it doesn’t work, does it fail to work for the same reason as
the definition without the rec?
rec ones = lz-link(1, {(): ones})
Notice that {(): …} defines an anonymous function of no
arguments. You can’t leave out the ()! If you do, Pyret will
get confused about what your program means.
Because functions are automatically recursive, when we write a
function to create a stream, we don’t need to use rec. Consider
this example:
fun nats-from(n :: Number):
lz-link(n, {(): nats-from(n + 1)})
end
with which we can define the natural numbers:
Note that the definition of nats is not recursive itself—the
recursion is inside nats-from—so we don’t need to use
rec to define nats.
Earlier, we said that every list is finite and hence eventually
terminates. How does this remark apply to streams, such as the
definition of ones or nats above?
The description of ones is still a finite one; it simply
represents the potential for an infinite number of values. Note
that:
A similar reasoning doesn’t apply to lists because the rest of
the list has already been constructed; in contrast, placing a function
there creates the potential for a potentially unbounded amount of
computation to still be forthcoming.
That said, even with streams, in any given computation, we will
create only a finite prefix of the stream. However, we don’t have to
prematurely decide how many; each client and use is welcome to extract
less or more, as needed.
Now we’ve created multiple streams, but we still don’t have an easy
way to “see” one. First we’ll define the traditional list-like
selectors. Getting the first element works exactly as with lists:
fun lz-first<T>(s :: Stream<T>) -> T: s.h end
In contrast, when trying to access the rest of the stream, all we get
out of the data structure is a thunk. To access the actual rest, we
need to force the thunk, which of course means applying it to no
arguments:
fun lz-rest<T>(s :: Stream<T>) -> Stream<T>: s.t() end
This is useful for examining individual values of the
stream. It is also useful to extract a finite prefix of
it (of a given size) as a (regular) list, which would be especially
handy for testing. Let’s write that function:
fun take<T>(n :: Number, s :: Stream<T>) -> List<T>:
if n == 0:
empty
else:
link(lz-first(s), take(n - 1, lz-rest(s)))
end
end
If you pay close attention, you’ll find that this body is not defined
by cases
over the structure of the (stream) input—
instead,
it’s defined by the cases of the definition of a natural number (zero
or a successor). We’ll return to this below (<lz-map2-def>).Now that we have this, we can use it for testing. Note that usually we
use our data to test our functions; here, we’re using this function to
test our data:
check:
take(10, ones) is map(lam(_): 1 end, range(0, 10))
take(10, nats) is range(0, 10)
take(10, nats-from(1)) is map((_ + 1), range(0, 10))
end
The notation (_ + 1) defines a Pyret function of
one argument that adds 1 to the given argument.
Let’s define one more function: the equivalent of map over
streams. For reasons that will soon become obvious, we’ll define a
version that takes two lists and applies the first argument to them
pointwise:
fun lz-map2<A, B, C>( |
f :: (A, B -> C), |
s1 :: Stream<A>, |
s2 :: Stream<B>) -> Stream<C>: |
lz-link( |
f(lz-first(s1), lz-first(s2)), |
{(): lz-map2(f, lz-rest(s1), lz-rest(s2))}) |
end |
Now we can see our earlier remark about the structure of the function
driven home especially clearly. Whereas a traditional
map over
lists would have two cases, here we have only one case because the
data definition (
<stream-type-def>) has only one case!
What is the consequence of this? In a traditional
map, one case
looks like the above, but the other case corresponds to the
empty
input, for which it produces the same output. Here, because the stream
never terminates, mapping over it doesn’t either, and the structure of
the function reflects this.
This raises a much subtler
problem: if the function’s body doesn’t have base- and
inductive-cases, how can we perform an inductive proof over it? The
short answer is we can’t: we must instead use
☛ coinduction.Why did we define lz-map2 instead of lz-map? Because it
enables us to write the following:
rec fibs =
lz-link(0,
{(): lz-link(1,
{(): lz-map2({(a :: Number, b :: Number): a + b},
fibs,
lz-rest(fibs))})})
from which, of course, we can extract as many Fibonacci numbers as we
want!
check:
take(10, fibs) is [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
end
Define the equivalent of map, filter, and fold
for streams.
Streams and, more generally, infinite data structures that unfold on
demand are extremely valuable in programming. Consider, for instance,
the possible moves in a game. In some games, this can be infinite;
even if it is finite, for interesting games the combinatorics mean
that the tree is too large to feasibly store in memory. Therefore, the
programmer of the computer’s intelligence must unfold the game tree
on demand. Programming it by using the encoding we have
described above means the program describes the entire tree,
lazily, and the tree unfolds automatically on demand, relieving the
programmer of the burden of implementing such a strategy.
In some languages, such as Haskell, lazy evaluation is built in
by default. In such a language, there is no need to use
thunks. However, lazy evaluation places other burdens on the language
[REF].
15.4 Combining Forces: Streams of Derivatives
When we defined d-dx, we set epsilon to an arbitrary, high
value. We could instead think of epsilon as itself a stream that
produces successively finer values; then, for instance, when the
difference in the value of the derivative becomes small enough, we can
decide we have a sufficient approximation to the derivative.
The first step is, therefore, to make epsilon some kind of
parameter rather than a global constant. That leaves open what kind of
parameter it should be (number or stream?) as well as when it should
be supplied.
It makes most sense to consume this parameter after we have decided
what function we want to differentiate and at what value we want its
derivative; after all, the stream of epsilon values may depend
on both. Thus, we get:
fun d-dx(f :: (Number -> Number)) ->
(Number -> (Number -> Number)):
lam(x :: Number) -> (Number -> Number):
lam(epsilon :: Number) -> Number:
(f(x + epsilon) - f(x)) / epsilon
end
end
end
with which we can return to our square example:
d-dx-square = d-dx(square)
Note that at this point we have simply redefined d-dx without
any reference to streams: we have merely made a constant into a
parameter.
Now let’s define the stream of negative powers of ten:
tenths = block:
fun by-ten(d):
new-denom = d / 10
lz-link(new-denom, lam(): by-ten(new-denom) end)
end
by-ten(1)
end
so that
check:
take(3, tenths) is [list: 1/10, 1/100, 1/1000]
end
For concreteness, let’s pick an abscissa at which to compute the
numeric derivative of square—say 10:
d-dx-square-at-10 = d-dx-square(10)
Recall, from the types, that this is now a function of type
(Number -> Number): given a value for epsilon, it computes
the derivative using that value. We know, analytically, that the
value of this derivative should be 20. We can now (lazily) map
tenths to provide increasingly better approximations for
epsilon and see what happens:
lz-map(d-dx-square-at-10, tenths)
Sure enough, the values we obtain are 20.1, 20.01,
20.001, and so on: progressively better numerical
approximations to 20.
Extend the above program to take a tolerance, and draw as many values
from the epsilon stream as necessary until the difference
between successive approximations of the derivative fall within this
tolerance.