Wednesday, July 30, 2008

Morphisms for the Masses

No, not polymorphism. That's one we might consider unlearning. What I'm calling the “morphisms” refers to several types of recursive functions working with lists. Yeah, it's a neologism; that's the type of artistic license that comes with self publishing. But seriously, the morphisms aren't that hard, they just aren't often presented in a learnable way. And learning them gives you a more general way to talk about types of recursive functions, which can help you better explain your abstractions and designs. Let's get started.

The coolest part of this is that you get to use Greek letters. κατα means “downwards”, as in “catastrophe”, and μορφή means “form” or “shape”. Put them together and you have catamorphism. Apply them to computer science and you have a recursive function that starts with a broad shape (a big old list) and funnels it into a single output (like an integer). The size() or length() method on a list can be a catamorphism. These functions are sometimes called “fold”. They take a type signature of List <Type A> into <Type B>. Or in Java:

int length(List list);
Which might be an implementation of the more general:
T length(List<?> list);
In F# (and I believe OCaml), the type signature for a catamorphism is:
'a list -> 'b
Read these type signatures by looking at the far right element first ('b). That is the return type. Everything else is a parameter. So “tick a” means a generic argument and “tick b” means a generic return type. The type signature of the List.length function in the F# core library is:
('a list -> int)
Phew. We're done with the types. Let's look at one more common example of a catamorphism: the sum function. Take a guess as to the type signature of sum(List)...
int list -> int
Yup, take a list of integers and fold it into a single integer result. And here is a naïve implementation:
let rec sum list =
match list with
| [] -> 0
| head :: tail -> head + (sum tail)
sum takes a list and returns 0 if the list is empty. Otherwise, it splits the list into a head and a tail, and adds the head onto the sum of the tail. Easy Peasy.

This is the opposite of catamorphism. Some people like to say ”category dual” instead of opposite; you'll sound smarter if you do too. The Greek involved here is ανα, meaning “upward”, and μορφή meaning “shape”. Take a single input and expand it upwards into a wide list. This one is frequently called “unfold”, and a typical Java signature for an anamorphism might be:
List<Integer> first(Integer);
Which would presumably be a function to return a list of the first n numbers between 1 and the parameter. The same type signature in F# is:
int -> int list
That is, a function that takes in integer and produces a list of integers. In general, anamorphisms have the type signature:
'a -> 'b list
A single value is used to generate a list of other values, possibly of a different type. Let's take a look at an anamorphism that generates a list of the first n integers greater than 0, that is, a function with the signature int -> int list:
let rec first max =
match max with
| 1 -> [1]
| x -> x :: first (x-1)
First off, this isn't tail recursive! No bother... the point is to see that a list of integers is generated by building a cons list using the supplied value “max”.

Hey, we're done with the basics! If you understand the previous two types then you'll understand this one: a hylomorphism is the composition of an anamorphism and a catamorphism. The fun Greek part is ὑλο (hylo) meaning “wood”, which won't help you remember the term one bit. So putting an anamorphism ( 'a -> 'b list) together with a catamorphism ('b list -> 'c) results in a type signature of 'a -> 'b. Combine an unfold with a fold and you'll have a function that takes a value and returns a value, and hidden in the middle is a recursive function that builds then reduces cons lists. Using our previous examples of first and sum, we can put them together into a simple hylomorphism that sums the first n elements of the natural number scale:
let sumFirst max =
sum (first max)
This example explains the “composition” part of composing the anamorphism and the catamorphism, but it doesn't quite explain the recursive nature of a hylomorphism. See, a hylomorphism is going to build up a list and then perform an operation on each element of the list. Factorial is the archetypal hylomorphism:
let rec factorial x =
if x = 1 then 1
else x * (factorial (x-1))
You'll notice this is not tail recursive. At runtime, the recursive factorial calls build up a list of all the factorial results before applying the * multiplication. You might visualize the processing steps like this:
factorial 4 =
4 * (factorial 3) =
4 * 3 * (factorial 2) =
4 * 3 * 2 * (factorial 1) =
4 * 3 * 2 * 1 =
4 * 3 * 2 =
4 * 6 =
This uses a lot of memory and a lot of stack frames! But it doesn't have to be so. Some language compilers will recognize a hylomorphism and short cut it into a less resource intensive and tail-call like structure, applying the multiplication as it becomes available, not waiting until a large list is built up:
factorial 4 =
4 * (factorial 3) =
12 * (factorial 2) =
24 * (factorial 1) =
24 * 1 =
Visually, this method of executing the hylomorphism takes less steps, but the point is to eliminate the intermediate list and the resources associated with building it. This shortcut has a name: deforestation. And the compiler that does it is the Glasgow Haskell Compiler. Cool.

And now there's just one last morphism to cover...

A metamorphism is the categorical dual of a hylomorphism: a catamorphism followed by an anamorphism. There's not a lot to say on this one... it doesn't help you understand deforestation and I don't even have the Greek translation for it. Nonetheless, I'm obligated to cover it for completeness.
A metamorphism is going to compose a catamorphism ('a list -> 'b) with an anamorphism ('b -> 'c list) to create a function of type ('a list -> 'c list). The previous examples can't be composed into a good metamorphism, so consider the function which turns a list of strings into a list of characters...
wordlist_to_chars ["The";"quick";"brown";"fox"]
Would be defined as a function composition:
let wordlist_to_chars list =
chars (concat list)
where the concat function has the signature “string list -> string”, chars has the signature “string -> char list”, leaving wordlist_to_chars to have the composed signature “string list -> char list”. Implementing concat and chars as recursive functions is left as an exercise to the reader.

That it... you've seen some morphisms, some deforestation, and some cool Greek letters. Are you ready for paramorphisms, apomorphisms, and "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire”? I'm not, but let me know how it goes.


Chris Smith said...

Great post! I always wondered where the term 'Catamorphism' came from.

jkff said...

Hmm, I don't know a single compiler that is capable of transforming a recursive factorial into an iterative one.
Doing so requires knowledge about properties of certain operations (in this case, associativity of (*) and the fact that 1 is both its left and right identity), and in all cases but the most trivial (like factorial) the compiler can't do absolutely anything, so trying to implement such an optimization just for the sake of a factorial is useless.

Hamlet D'Arcy said...

@jfkk I don't remember what paper I pulled that from but I think it was called "An Introduction to the Bird-Meertens Formalism". A paper is pretty different from a compiler though. Maybe it doesn't exist.

The god of Technology said...

Thank you for piercing the darkness. Programming Language mortals will rejoice at this newfound knowledge.