Loading [MathJax]/extensions/tex2jax.js

12 Recursive Data

Sometimes, a data definition has a piece that refers back to itself. For example, a linked list of numbers:

data NumList:
  | nl-empty
  | nl-link(first :: Number, rest :: NumList)
end

Moving on to defining examples, we can talk about empty lists:

The nl- stands for NumList. This avoids clashing with Pyret’s empty.
nl-empty

We can represent short lists, like a sequence of two 4’s:

nl-link(4, nl-link(4, nl-empty))

Since these are created with constructors from data, we can use cases with them:

    cases (NumList) nl-empty:
      | nl-empty => "empty!"
      | nl-link(first, rest) => "not empty"
    end

=>  "empty!"
    cases (NumList) nl-link(1, nl-link(2, nl-empty)):
      | nl-empty => "empty!"
      | nl-link(first, rest) => first
    end

=>  1

This style of data definition permits us to create data that are unbounded or arbitrarily-sized. Given a NumList, there is an easy way to make a new, larger one: just use nl-link. So, we need to consider larger lists:

nl-link(1,
  nl-link(2,
    nl-link(3,
      nl-link(4,
        nl-link(5,
          nl-link(6,
            nl-link(7,
              nl-link(8,
                nl-empty))))

Let’s try to write a function contains-3, which returns true if the NumList contains the value 3, and false otherwise.

First, our header:

fun contains-3(nl :: NumList) -> Boolean:
  doc: "Produces true if the list contains 3, false otherwise"
end

Next, some tests:

fun contains-3(nl :: NumList) -> Boolean:
  doc: "Produces true if the list contains 3, false otherwise"
where:
  contains-3(nl-empty) is false
  contains-3(nl-link(3, nl-empty)) is true
  contains-3(nl-link(1, nl-link(3, nl-empty))) is true
  contains-3(nl-link(1, nl-link(2, nl-link(3, nl-link(4, nl-empty))))) is true
  contains-3(nl-link(1, nl-link(2, nl-link(5, nl-link(4, nl-empty))))) is false
end

Next, we need to fill in the body with the template for a function over NumLists. We can start with the analogous template using cases we had before:

fun contains-3(nl :: NumList) -> Boolean:
  doc: "Produces true if the list contains 3, false otherwise"
  cases (NumList) nl:
    | nl-empty => ...
    | nl-link(first, rest) =>
      ... first ...
      ... rest ...
  end
end

An empty list doesn’t contain the number 3, surely, so the answer must be false in the nl-empty case. In the nl-link case, if the first element is 3, we’ve successfully answered the question. That only leaves the case where the argument is an nl-link and the first element does not equal 3:

fun contains-3(nl :: NumList) -> Boolean:
  cases (NumList) nl:
    | nl-empty => false
    | nl-link(first, rest) =>
      if first == 3:
        true
      else:
        # handle rest here
      end
  end
end

Since we know rest is a NumList (based on the data definition), we can use a cases expression to work with it. This is sort of like filling in a part of the template again:

fun contains-3(nl :: NumList) -> Boolean:
  cases (NumList) nl:
    | nl-empty => false
    | nl-link(first, rest) =>
      if first == 3:
        true
      else:
        cases (NumList) rest:
          | nl-empty => ...
          | nl-link(first-of-rest, rest-of-rest) =>
            ... first-of-rest ...
            ... rest-of-rest ...
        end
      end
  end
end

If the rest was empty, then we haven’t found 3 (just like when we checked the original argument, nl). If the rest was a nl-link, then we need to check if the first thing in the rest of the list is 3 or not:

fun contains-3(nl :: NumList) -> Boolean:
  cases (NumList) nl:
    | nl-empty => false
    | nl-link(first, rest) =>
      if first == 3:
        true
      else:
        cases (NumList) rest:
          | nl-empty => false
          | nl-link(first-of-rest, rest-of-rest) =>
            if first-of-rest == 3:
              true
            else:
              # fill in here ...
            end
        end
      end
  end
end

Since rest-of-rest is a NumList, we can fill in a cases for it again:

fun contains-3(nl :: NumList) -> Boolean:
  cases (NumList) nl:
    | nl-empty => false
    | nl-link(first, rest) =>
      if first == 3:
        true
      else:
        cases (NumList) rest:
          | nl-empty => false
          | nl-link(first-of-rest, rest-of-rest) =>
            if first-of-rest == 3:
              true
            else:
              cases (NumList) rest-of-rest:
                | nl-empty => ...
                | nl-link(first-of-rest-of-rest, rest-of-rest-of-rest) =>
                  ... first-of-rest-of-rest ...
                  ... rest-of-rest-of-rest ...
              end
            end
        end
      end
  end
end

See where this is going? Not anywhere good. We can copy this cases expression as many times as we want, but we can never answer the question for a list that is just one element longer than the number of times we copy the code.

So what to do? We tried this approach of using another copy of cases based on the observation that rest is a NumList, and cases provides a meaningful way to break apart a NumList; in fact, it’s what the recipe seems to lead to naturally.

Let’s go back to the step where the problem began, after filling in the template with the first check for 3:

fun contains-3(nl :: NumList) -> Boolean:
  cases (NumList) nl:
    | nl-empty => false
    | nl-link(first, rest) =>
      if first == 3:
        true
      else:
        # what to do with rest?
      end
  end
end

We need a way to compute whether or not the value 3 is contained in rest. Looking back at the data definition, we see that rest is a perfectly valid NumList, simply by the definition of nl-link. And we have a function (or, most of one) whose job is to figure out if a NumList contains 3 or not: contains-3. That ought to be something we can call with rest as an argument, and get back the value we want:

fun contains-3(nl :: NumList) -> Boolean:
  cases (NumList) nl:
    | nl-empty => false
    | nl-link(first, rest) =>
      if first == 3:
        true
      else:
        contains-3(rest)
      end
  end
end

And lo and behold, all of the tests defined above pass. It’s useful to step through what’s happening when this function is called. Let’s look at an example:

contains-3(nl-link(1, nl-link(3, nl-empty)))

First, we substitute the argument value in place of nl everywhere it appears; that’s just the usual rule for function calls.

=>  cases (NumList) nl-link(1, nl-link(3, nl-empty)):
       | nl-empty => false
       | nl-link(first, rest) =>
         if first == 3:
           true
         else:
           contains-3(rest)
         end
     end

Next, we find the case that matches the constructor nl-link, and substitute the corresponding pieces of the nl-link value for the first and rest identifiers.

=>  if 1 == 3:
      true
    else:
      contains-3(nl-link(3, nl-empty))
    end

Since 1 isn’t 3, the comparison evaluates to false, and this whole expression evaluates to the contents of the else branch.

=>  if false:
      true
    else:
      contains-3(nl-link(3, nl-empty))
    end

=>  contains-3(nl-link(3, nl-empty))

This is another function call, so we substitute the value nl-link(3, nl-empty), which was the rest field of the original input, into the body of contains-3 this time.

=>  cases (NumList) nl-link(3, nl-empty):
      | nl-empty => false
      | nl-link(first, rest) =>
        if first == 3:
          true
        else:
          contains-3(rest)
        end
    end

Again, we substitute into the nl-link branch.

=>  if 3 == 3:
      true
    else:
      contains-3(nl-empty)
    end

This time, since 3 is 3, we take the first branch of the if expression, and the whole original call evaluates to true.

=>  if true:
      true
    else:
      contains-3(nl-empty)
    end

=> true

An interesting feature of this trace through the evaluation is that we reached the expression contains-3(nl-link(3, nl-empty)), which is a normal call to contains-3; it could even be a test case on its own. The implementation works by doing something (checking for equality with 3) with the non-recursive parts of the datum, and combining that result with the result of the same operation (contains-3) on the recursive part of the datum. This idea of doing recursion with the same function on self-recursive parts of the datatype lets us extend our template to handle recursive positions.

The simple design recipe dictated this as the template:

#|
fun num-list-fun(nl :: NumList) -> ???:
  cases (NumList) nl:
    | nl-empty => ...
    | nl-link(first, rest) =>
      ... first ...
      ... rest ...
  end
end
|#

However, this template doesn’t give much guidance with what to do with the rest field. We will extend the template with the following rule: each self-recursive position in the data definition becomes a self-recursive function call in the template. So the new template looks like:

#|
fun num-list-fun(nl :: NumList) -> ???:
  cases (NumList) nl:
    | nl-empty => ...
    | nl-link(first, rest) =>
      ... first ...
      ... num-list-fun(rest) ...
  end
end
|#

To handle recursive data, the design recipe just needs to be modified to have this extended template. When you see a recursive data definition (of which there will be many when programming in Pyret), you should naturally start thinking about what the recursive calls will return and how to combine their results with the other, non-recursive pieces of the datatype.

Exercise

Use the design recipe to write a function contains-n that takes a NumList and a Number, and returns whether that number is in the NumList.

Exercise

Use the design recipe to write a function sum that takes a NumList, and returns the sum of all the numbers in it. The sum of the empty list is 0.

Exercise

Use the design recipe to write a function remove-3 that takes a NumList, and returns a new NumList with any 3’s removed. The remaining elements should all be in the list in the same order they were in the input.

Exercise

Write a data definition called NumListList that represents a list of NumLists, and use the design recipe to write a function sum-of-lists that takes a NumListList and produces a NumList containing the sums of the sub-lists.