10 Introduction to Structured Data
Earlier we had our first look at types. Until now, we have only seen
the types that Pyret provides us, which is an interesting but
nevertheless quite limited set. Most programs we write will contain
many more kinds of data.
10.1 Understanding the Kinds of Compound Data
10.1.1 A First Peek at Structured Data
There are times when a datum has many attributes, or parts. We
need to keep them all together, and sometimes take them apart. For
instance:
An iTunes entry contains a bunch of information about a single
song: not only its name but also its singer, its length, its genre,
and so on.

Your GMail application contains a bunch of information about a
single message: its sender, the subject line, the conversation it’s
part of, the body, and quite a bit more.

In examples like this, we see the need for structured data: a
single datum has structure, i.e., it
actually consists of many pieces. The number of pieces is
fixed, but may be of different kinds (some might be numbers,
some strings, some images, and different types may be mixed together
in that one datum). Some might even be other structured data:
for instance, a date usually has at least three parts, the day, month,
and year. The parts of a structured datum are called its fields.
10.1.2 A First Peek at Conditional Data
Then there are times when we want to represent different kinds of
data under a single, collective umbrella. Here are a few examples:
A traffic light can be in different states: red, yellow, or
green.Yes, in some countries there are different or more
colors and color-combinations. Collectively, they represent one
thing: a new type called a traffic light state.
A zoo consists of many kinds of animals. Collectively, they
represent one thing: a new type called an animal. Some condition
determines which particular kind of animal a zookeeper might be dealing
with.
A social network consists of different kinds of pages. Some
pages represent individual humans, some places, some organizations,
some might stand for activities, and so on. Collectively, they
represent a new type: a social media page.
A notification application may report many kinds of events. Some
are for email messages (which have many fields, as we’ve discussed),
some are for reminders (which might have a timestamp and a note), some
for instant messages (similar to an email message, but without a
subject), some might even be for the arrival of a package by physical
mail (with a timestamp, shipper, tracking number, and delivery
note). Collectively, these all represent a new type: a notification.
We call these “conditional” data because they represent an “or”: a
traffic light is red or green or yellow; a social
medium’s page is for a person or location or
organization; and so on. Sometimes we care exactly which kind of thing
we’re looking at: a driver behaves differently on different colors,
and a zookeeper feeds each animal differently. At other times, we
might not care: if we’re just counting how many animals are in the
zoo, or how many pages are on a social network, or how many unread
notifications we have, their details don’t matter. Therefore, there are
times when we ignore the conditional and treat the datum as a member
of the collective, and other times when we do care about the
conditional and do different things depending on the individual
datum. We will make all this concrete as we start to write programs.
10.2 Defining and Creating Structured and Conditional Data
We have used the word “data” above, but that’s actually been a bit
of a lie. As we said earlier, data are how we represent
information in the computer. What we’ve been discussing above is
really different kinds of information, not exactly how they are
represented. But to write programs, we must wrestle concretely with
representations. That’s what we will do now, i.e., actually show
data representations of all this information.
10.2.1 Defining and Creating Structured Data
Let’s start with defining structured data, such as an iTunes song
record. Here’s a simplified version of the information such an app
might store:
The song’s name, which is a String.
The song’s singer, which is also a String.
The song’s year, which is a Number.
Let’s now introduce the syntax by which we can teach this to Pyret:
data ITunesSong: song(name, singer, year) end
This tells Pyret to introduce a new type of data, in this case
called ITunesSongWe follow a convention that types
always begin with a capital letter.. The way we actually make one of
these data is by calling song with three parameters; for
instance:It’s worth noting that music managers that are
capable of making distinctions between, say, Dance, Electronica, and
Electronic/Dance, classify two of these three songs by a single genre:
“World”.
<
structured-examples> ::=
song("La Vie en Rose", "Édith Piaf", 1945) |
song("Stressed Out", "twenty one pilots", 2015) |
song("Waqt Ne Kiya Kya Haseen Sitam", "Geeta Dutt", 1959) |
Always follow a data definition with a few concrete instances of the
data! This makes sure you actually do know how to make data of that
form. Indeed, it’s not essential but a good habit to give names to the
data we’ve defined, so that we can use them later:
lver = song("La Vie en Rose", "Édith Piaf", 1945)
so = song("Stressed Out", "twenty one pilots", 2015)
wnkkhs = song("Waqt Ne Kiya Kya Haseen Sitam", "Geeta Dutt", 1959)
10.2.2 Annotations for Structured Data
Recall that in [
Type Annotations] we discussed annotating our functions. Well, we
can annotate our data, too! In particular, we can annotate both the
definition of data and their
creation. For the former,
consider this data definition, which makes the annotation information
we’d recorded informally in text a formal part of the program:
data ITunesSong: song(name :: String, singer :: String, year :: Number) end
Similarly, we can annotate the variables bound to examples of the
data. But what should we write here?
lver :: ___ == song("La Vie en Rose", "Édith Piaf", 1945)
Recall that annotations takes names of types, and the new type we’ve
created is called ITunesSong. Therefore, we should write
lver :: ITunesSong = song("La Vie en Rose", "Édith Piaf", 1945)
What happens if we instead write this?
lver :: String = song("La Vie en Rose", "Édith Piaf", 1945)
What error do we get? How about if instead we write these?
lver :: song = song("La Vie en Rose", "Édith Piaf", 1945)
lver :: 1 = song("La Vie en Rose", "Édith Piaf", 1945)
Make sure you familiarize yourself with the error messages that you
get.
10.2.3 Defining and Creating Conditional Data
The data construct in Pyret also lets us create conditional
data, with a slightly different syntax. For instance, say we want to
define the colors of a traffic light:
data TLColor:
| Red
| Yellow
| Green
end
Conventionally, the names of the options begin in
lower-case, but if they have no additional structure, we often
capitalize the initial to make them look different from ordinary
variables: i.e., Red rather than red.
Each | (pronounced “stick”) introduces another option. You
would make instances of traffic light colors as
A more interesting and common example is when each condition has some
structure to it; for instance:
data Animal:
| boa(name :: String, length :: Number)
| armadillo(name :: String, liveness :: Boolean)
end
“In Texas, there ain’t nothin’ in the middle of the road
except yellow stripes and a dead armadillo.”—Jim Hightower
We can make examples of them as you would expect:
b1 = boa("Alice", 10)
b2 = boa("Bob", 8)
a1 = armadillo("Glypto", true)
We call the different conditions variants.
How would you annotate the three variable bindings?
Notice that the distinction between boas and armadillos is lost in the
annotation. When we get to refinements [REF] we can recapture this
distinction if we really want it.
b1 :: Animal = boa("Alice", 10)
b2 :: Animal = boa("Bob", 8)
a1 :: Animal = armadillo("Glypto", true)
When defining a conditional datum the first stick is actually
optional, but adding it makes the variants line up nicely. This helps
us realize that our first example
data ITunesSong: song(name, singer, year) end
is really just the same as
data ITunesSong:
| song(name, singer, year)
end
i.e., a conditional type with just one condition, where that one
condition is structured.
10.3 Programming with Structured and Conditional Data
So far we’ve learned how to create structured and conditional data,
but not yet how to take them apart or write any expressions that
involve them. As you might expect, we need to figure out how to
As we’ll see, Pyret also gives us a convenient way to do both
together.
10.3.1 Extracting Fields from Structured Data
Let’s write a function that tells us how old a song is. First, let’s
think about what the function consumes (an ITunesSong) and
produces (a Number). This gives us a rough skeleton for the
function:
We know that the form of the body must be roughly:
2016 - <get the song year> |
We can get the song year by using Pyret’s field access, which is
a . followed by a field’s name—in this case,
year—following the variable that holds the structured
datum. Thus, we get the year field of s (the parameter
to song-age) with
So the entire function body is:
fun song-age(s :: ITunesSong) -> Number:
2016 - s.year
end
It would be good to also record some examples
(
<structured-examples>), giving us a comprehensive
definition of the function:
fun song-age(s :: ITunesSong) -> Number:
2016 - s.year
where:
song-age(lver) is 71
song-age(so) is 1
song-age(wnkkhs) is 57
end
10.3.2 Telling Apart Variants of Conditional Data
Now let’s see how we tell apart variants. For this, we have to
introduce another new piece of Pyret syntax: cases. A
cases expression has several branches: exactly as many as there
are in the data definition. Each branch corresponds to one of the
variants. Thus, if we wanted to compute advice for a driver based on a
traffic light’s state, we might write:
fun advice(c :: TLColor) -> String:
cases (TLColor) c:
| Red => "wait!"
| Yellow => "get ready..."
| Green => "go!"
end
end
Note that cases is followed by the name of the
conditionally-defined type in parentheses (here, TLColor), and
then an expression that computes a value of that type (in this case,
c is already bound to such a value). Each variant is followed
by =>, and then an expression that computes an answer for that
variant.
What happens if you leave out the =>?
What if you leave out a variant? Leave out the Red Variant,
then try both advice(Yellow) and advice(Red).
10.3.3 Processing Fields of Variants
In this example, the variants had no fields. But if the variant has
fields, Pyret expects you to list names of variables for those fields,
and will then automatically bind those variables—so you don’t need
to use the .-notation to get the field values.
To illustrate this, assume we want to get the name of any animal:
Because an Animal is conditionally defined, we know that we are
likely to want a cases to pull it apart; furthermore, we should
give names to each of the fields:Note that the names of the
variables do not have to match the names of
fields. Conventionally, we give longer, descriptive names to
the field definitions and short names to the corresponding variables.
cases (Animal) a: |
| boa(n, l) => ... |
| armadillo(n, l) => ... |
end |
In both cases, we want to return the field n, giving us the
complete function:
fun animal-name(a :: Animal) -> String:
cases (Animal) a:
| boa(n, l) => n
| armadillo(n, l) => n
end
where:
animal-name(b1) is "Alice"
animal-name(b2) is "Bob"
animal-name(a1) is "Glypto"
end