10 Introduction to Structured Data

← prev up next →

10 Introduction to Structured Data

10.1 Understanding the Kinds of Compound Data

10.1.1 A First Peek at Structured Data

10.1.2 A First Peek at Conditional Data

10.2 Defining and Creating Structured and Conditional Data

10.2.1 Defining and Creating Structured Data

10.2.2 Annotations for Structured Data

10.2.3 Defining and Creating Conditional Data

10.3 Programming with Structured and Conditional Data

10.3.1 Extracting Fields from Structured Data

10.3.2 Telling Apart Variants of Conditional Data

10.3.3 Processing Fields of Variants

Earlier we had our first look at types. Until now, we have only seen the types that Pyret provides us, which is an interesting but nevertheless quite limited set. Most programs we write will contain many more kinds of data.

10.1 Understanding the Kinds of Compound Data

10.1.1 A First Peek at Structured Data

There are times when a datum has many attributes, or parts. We need to keep them all together, and sometimes take them apart. For instance:

An iTunes entry contains a bunch of information about a single song: not only its name but also its singer, its length, its genre, and so on.
Your GMail application contains a bunch of information about a single message: its sender, the subject line, the conversation it’s part of, the body, and quite a bit more.

In examples like this, we see the need for structured data: a single datum has structure, i.e., it actually consists of many pieces. The number of pieces is fixed, but may be of different kinds (some might be numbers, some strings, some images, and different types may be mixed together in that one datum). Some might even be other structured data: for instance, a date usually has at least three parts, the day, month, and year. The parts of a structured datum are called its fields.

10.1.2 A First Peek at Conditional Data

Then there are times when we want to represent different kinds of data under a single, collective umbrella. Here are a few examples:

A traffic light can be in different states: red, yellow, or green.Yes, in some countries there are different or more colors and color-combinations. Collectively, they represent one thing: a new type called a traffic light state.
A zoo consists of many kinds of animals. Collectively, they represent one thing: a new type called an animal. Some condition determines which particular kind of animal a zookeeper might be dealing with.
A social network consists of different kinds of pages. Some pages represent individual humans, some places, some organizations, some might stand for activities, and so on. Collectively, they represent a new type: a social media page.
A notification application may report many kinds of events. Some are for email messages (which have many fields, as we’ve discussed), some are for reminders (which might have a timestamp and a note), some for instant messages (similar to an email message, but without a subject), some might even be for the arrival of a package by physical mail (with a timestamp, shipper, tracking number, and delivery note). Collectively, these all represent a new type: a notification.

We call these “conditional” data because they represent an “or”: a traffic light is red or green or yellow; a social medium’s page is for a person or location or organization; and so on. Sometimes we care exactly which kind of thing we’re looking at: a driver behaves differently on different colors, and a zookeeper feeds each animal differently. At other times, we might not care: if we’re just counting how many animals are in the zoo, or how many pages are on a social network, or how many unread notifications we have, their details don’t matter. Therefore, there are times when we ignore the conditional and treat the datum as a member of the collective, and other times when we do care about the conditional and do different things depending on the individual datum. We will make all this concrete as we start to write programs.

10.2 Defining and Creating Structured and Conditional Data

We have used the word “data” above, but that’s actually been a bit of a lie. As we said earlier, data are how we represent information in the computer. What we’ve been discussing above is really different kinds of information, not exactly how they are represented. But to write programs, we must wrestle concretely with representations. That’s what we will do now, i.e., actually show data representations of all this information.

10.2.1 Defining and Creating Structured Data

Let’s start with defining structured data, such as an iTunes song record. Here’s a simplified version of the information such an app might store:

The song’s name, which is a String.
The song’s singer, which is also a String.
The song’s year, which is a Number.

Let’s now introduce the syntax by which we can teach this to Pyret:

data ITunesSong: song(name, singer, year) end

This tells Pyret to introduce a new type of data, in this case called ITunesSongWe follow a convention that types always begin with a capital letter.. The way we actually make one of these data is by calling song with three parameters; for instance:It’s worth noting that music managers that are capable of making distinctions between, say, Dance, Electronica, and Electronic/Dance, classify two of these three songs by a single genre: “World”.

<structured-examples> ::=

song("La Vie en Rose", "Édith Piaf", 1945)

song("Stressed Out", "twenty one pilots", 2015)

song("Waqt Ne Kiya Kya Haseen Sitam", "Geeta Dutt", 1959)

Always follow a data definition with a few concrete instances of the data! This makes sure you actually do know how to make data of that form. Indeed, it’s not essential but a good habit to give names to the data we’ve defined, so that we can use them later:

lver = song("La Vie en Rose", "Édith Piaf", 1945)
so = song("Stressed Out", "twenty one pilots", 2015)
wnkkhs = song("Waqt Ne Kiya Kya Haseen Sitam", "Geeta Dutt", 1959)

10.2.2 Annotations for Structured Data

Recall that in [Type Annotations] we discussed annotating our functions. Well, we can annotate our data, too! In particular, we can annotate both the definition of data and their creation. For the former, consider this data definition, which makes the annotation information we’d recorded informally in text a formal part of the program:

data ITunesSong: song(name :: String, singer :: String, year :: Number) end

Similarly, we can annotate the variables bound to examples of the data. But what should we write here?

lver :: ___ == song("La Vie en Rose", "Édith Piaf", 1945)

Recall that annotations takes names of types, and the new type we’ve created is called ITunesSong. Therefore, we should write

lver :: ITunesSong = song("La Vie en Rose", "Édith Piaf", 1945)

Do Now!
What happens if we instead write this?
lver :: String = song("La Vie en Rose", "Édith Piaf", 1945)
What error do we get? How about if instead we write these?
lver :: song = song("La Vie en Rose", "Édith Piaf", 1945)
lver :: 1 = song("La Vie en Rose", "Édith Piaf", 1945)
Make sure you familiarize yourself with the error messages that you get.

10.2.3 Defining and Creating Conditional Data

The data construct in Pyret also lets us create conditional data, with a slightly different syntax. For instance, say we want to define the colors of a traffic light:

data TLColor:
  | Red
  | Yellow
  | Green
end

Conventionally, the names of the options begin in lower-case, but if they have no additional structure, we often capitalize the initial to make them look different from ordinary variables: i.e., Red rather than red. Each | (pronounced “stick”) introduces another option. You would make instances of traffic light colors as

Red
Green
Yellow

A more interesting and common example is when each condition has some structure to it; for instance:

data Animal:
  | boa(name :: String, length :: Number)
  | armadillo(name :: String, liveness :: Boolean)
end

“In Texas, there ain’t nothin’ in the middle of the road except yellow stripes and a dead armadillo.”—Jim Hightower We can make examples of them as you would expect:

b1 = boa("Alice", 10)
b2 = boa("Bob", 8)
a1 = armadillo("Glypto", true)

We call the different conditions variants.

Do Now!
How would you annotate the three variable bindings?

Notice that the distinction between boas and armadillos is lost in the annotation. When we get to refinements [REF] we can recapture this distinction if we really want it.

b1 :: Animal = boa("Alice", 10)
b2 :: Animal = boa("Bob", 8)
a1 :: Animal = armadillo("Glypto", true)

When defining a conditional datum the first stick is actually optional, but adding it makes the variants line up nicely. This helps us realize that our first example

data ITunesSong: song(name, singer, year) end

is really just the same as

data ITunesSong:
  | song(name, singer, year)
end

i.e., a conditional type with just one condition, where that one condition is structured.

10.3 Programming with Structured and Conditional Data

So far we’ve learned how to create structured and conditional data, but not yet how to take them apart or write any expressions that involve them. As you might expect, we need to figure out how to

take apart the fields of a structured datum, and
tell apart the variants of a conditional datum.

As we’ll see, Pyret also gives us a convenient way to do both together.

10.3.1 Extracting Fields from Structured Data

Let’s write a function that tells us how old a song is. First, let’s think about what the function consumes (an ITunesSong) and produces (a Number). This gives us a rough skeleton for the function:

<song-age> ::=

fun song-age(s :: ITunesSong) -> Number:

<song-age-body>

end

We know that the form of the body must be roughly:

<song-age-body> ::=

2016 - <get the song year>

We can get the song year by using Pyret’s field access, which is a . followed by a field’s name—in this case, year—following the variable that holds the structured datum. Thus, we get the year field of s (the parameter to song-age) with

s.year

So the entire function body is:

fun song-age(s :: ITunesSong) -> Number:
  2016 - s.year
end

It would be good to also record some examples (<structured-examples>), giving us a comprehensive definition of the function:

fun song-age(s :: ITunesSong) -> Number:
  2016 - s.year
where:
  song-age(lver) is 71
  song-age(so) is 1
  song-age(wnkkhs) is 57
end

10.3.2 Telling Apart Variants of Conditional Data

Now let’s see how we tell apart variants. For this, we have to introduce another new piece of Pyret syntax: cases. A cases expression has several branches: exactly as many as there are in the data definition. Each branch corresponds to one of the variants. Thus, if we wanted to compute advice for a driver based on a traffic light’s state, we might write:

fun advice(c :: TLColor) -> String:
  cases (TLColor) c:
    | Red => "wait!"
    | Yellow => "get ready..."
    | Green => "go!"
  end
end

Note that cases is followed by the name of the conditionally-defined type in parentheses (here, TLColor), and then an expression that computes a value of that type (in this case, c is already bound to such a value). Each variant is followed by =>, and then an expression that computes an answer for that variant.

Do Now!
What happens if you leave out the =>?

Do Now!
What if you leave out a variant? Leave out the Red Variant, then try both advice(Yellow) and advice(Red).

10.3.3 Processing Fields of Variants

In this example, the variants had no fields. But if the variant has fields, Pyret expects you to list names of variables for those fields, and will then automatically bind those variables—so you don’t need to use the .-notation to get the field values.

To illustrate this, assume we want to get the name of any animal:

<animal-name> ::=

fun animal-name(a :: Animal) -> String:

<animal-name-body>

end

Because an Animal is conditionally defined, we know that we are likely to want a cases to pull it apart; furthermore, we should give names to each of the fields:Note that the names of the variables do not have to match the names of fields. Conventionally, we give longer, descriptive names to the field definitions and short names to the corresponding variables.

<animal-name-body> ::=

cases (Animal) a:

| boa(n, l) => ...

| armadillo(n, l) => ...

end

In both cases, we want to return the field n, giving us the complete function:

fun animal-name(a :: Animal) -> String:
  cases (Animal) a:
    | boa(n, l) => n
    | armadillo(n, l) => n
  end
where:
  animal-name(b1) is "Alice"
  animal-name(b2) is "Bob"
  animal-name(a1) is "Glypto"
end

← prev up next →

1	Introduction
2	Acknowledgments
3	Getting Started
4	Naming Values
5	From Repeated Expressions to Functions
6	Conditionals and Booleans
7	Introduction to Tabular Data
8	From Tables to Lists
9	Processing Lists
10	Introduction to Structured Data
11	Collections of Structured Data
12	Recursive Data
13	Interactive Games as Reactive Systems
14	Examples, Testing, and Program Checking
15	Functions as Data
16	Predicting Growth
17	Sets Appeal
18	Halloween Analysis
19	Sharing and Equality
20	Graphs
21	State, Change, and More Equality
22	Algorithms That Exploit State
23	Processing Programs: Parsing
24	Processing Programs: A First Look at Interpretation
25	Interpreting Conditionals
26	Interpreting Functions
27	Reasoning about Programs: A First Look at Types
28	Safety and Soundness
29	Parametric Polymorphism
30	Type Inference
31	Mutation: Structures and Variables
32	Objects: Interpretation and Types
33	Control Operations
34	Pyret for Racketeers and Schemers
35	Glossary

10.1	Understanding the Kinds of Compound Data
10.2	Defining and Creating Structured and Conditional Data
10.3	Programming with Structured and Conditional Data