Friday, November 10, 2006

Functional Pointers

being anti-pattern the second.

Unlike (or not), I spent some time being an accidental practitioner of functional pointers myself.  For my senior thesis (which is probably quite the source of anti-patterns if I would go back and look at the code) I had developed a small graph library.  This library had predictable types like Vertex and Edge and it also had a type called Graph and there was at this point something of a problem.  See, I would have functions like:

addEdge :: Vertex -> Vertex -> Graph -> Graph

and functions like:

exits :: Vertex -> Graph -> [Vertex]

from which it should be obvious that the actual type of interest to anything ever is Graph.  Where, however, does that leave our friend vertex?  What is a vertex?

In the case of my graph library, that was quite easy to answer:

type Vertex = Integer

That is, a Vertex, by itself, has no meaning; it exists solely as an index into a larger structure.  Upon reflection, I've become vaguely dissatisfied with this approach.  First, it means that all functions take more parameters that would seem ideal (although the use of the state monad hides this).  Second, it introduces the possibility of failure a lot of places that it wouldn't seem natural.  For example, exits (above) can return [] for two reasons: first, that the vertex genuinely has no exits; second, that the vertex does not exist in the graph passed as the second parameter.  A better type signature would have been:

exits :: Vertex -> Graph -> Maybe [Vertex]

or, following the monads-not-maybe structure:

exits :: Monad m => Vertex -> Graph -> m [Vertex]

(Along which lines allow me to indicate that while the reasons for all monads having a fail method makes a certain amount of practical sense, from a semantic sense the constraint I want here is MonadZero, which does not so much exist.)

(Another interesting result of the functional pointers pattern is that the same pointer object can refer to different things in different data structures.  Functional pointers thus more approximate relative addresses than absolute ones.)

(And finally, allow me to note that the solution

type Vertex = (Int,Graph)

which actually occured to me is deeply unsatisfactory.  What is the result of

addEdge (v,g) (v',g')

where g /= g'?)

It's worth mentioning that this problem emerges only in the domain of certain world-views that contain mutable state; if we were content that a vertex refered to a specific graph, then the Vertex data structure could contain the remainder of the graph without problems.  However, this imposes certain restrictions on mutability; in particular, the notion of adding edges becomes somewhat difficult to formulate.

However, if we are committed to have references into a changing data structure, we can at least compare the desirability of the available solutions.  The first is undoubtedly the least desirable; making both vertices and graphs first class citizens leads to obnoxious types and various preventable errors.  Encapsualting all graph access in monadic operations is a step in the right direction; then, at least the computational context in which vertices are meaningful is apparent.  However, this is still not ideal:

f = y v'
where x = do v <- newVertex
return v
y v = do u <- newVertex
addEdge v u
v' = runGraph x


In this case, we construct a vertex in one evaluation context (corresponding to one graph), pass it into an action in a different context (corresponding to a different graph), and then use that vertex (presumably meaningless) in the other graph.  This is somewhat subtle: the important flaw is the existance of a function to extract the value from a monadic computation.  Without the runGraph function we would be fine.  On the other hand, without the runGraph function, it would also be difficult to write useful programs, since the results of graph computations would never be observable.


There is a best solution.  The ST monad, for example, supports using references within monadic computation, but attempting to extract such a reference produces a type error.  The details are complicated.


The original source of this discussion, though, was not graph libraries.  It was a different block of code, which represented one-to-many containment relationships roughly as follows:


data Thing = { thingData     :: ThingData
, thingChildren :: [Int]
, thingParent :: Maybe Int }

data Thing2 = { thing2Data :: TihngData
, thing2Examples :: [Int]
, thing2Container :: Maybe Int }

data Stuff = { things :: Map Int Thing
, thing2s :: Map Int Thing2 }


If it is not apparent to you that the children of a thing are actually thing2s, and the examples of thing2s are in fact things, then perhaps you need a brush up on functional pointers. 

Thursday, November 09, 2006

Functional Anti-Patterns

which I will be talking about shortly, but first I need to admit that for the past several years I have been quite wrong. It has been my (oft-expressed to anyone who would give even the semblance of listening) opinion that programming languages like Haskell are a Very Good Thing; that by making it natural to express algorithms in a way closer to the mathematics underlying their formal models, it makes it easier to create programs that could be reasoned about relatively easily and with relatively little programmer error; that thus equipped, programmers could produce better code in less time. In short, I have claimed, Haskell is an enabling tool for good programming.

The truth of the matter being, however, that I was titanically wrong. While it is possible for a good Haskell programmer to accomplish more in less time with a greater likelihood of success in Haskell than in, oh, say, for example, Java, it is not correspondingly true that a bad programmer will produce better code in less time than they would in oh, say, for example, Java. Rather, Haskell requires that bad programmers spend time that they could have spent writing bad Java code inventing exciting new ways to write bad code, specially tuned to the mechanisms of functional programming. My point is to elaborate on a few of those while taking the opportunity to expound on why Haskell is really a good language anyway and it's not the poor language's fault that such evil things were done with it.

My first target is a design pattern I like to refer to as (or not). This pattern simply consists of adding an implied (or not) to the end of each data type name. For example, you might be tempted to read a declaration that started

data Shape = ...

as "data Shape is," but when dealing with a practitioner of the (or not) pattern, you would be incorrect. This ought to be read "data Shape (or not)," which is also represented in the Shape data structure:

data Shape = ... | NoShape

This provides the immense benefit that you do not ever have to indicate when things can (or have) actually failed! Failure is implicit in everything, and instead of being handled once, it can be handled in every function ever written ever!

boundingBox :: Shape -> Shape
boundingBox (Rect topLeft bottomRight) =
Rect topLeft bottomRight
boundingBox (Circle (x,y) radius) =
Rect (x - radius, y - radius)
(x + radius, y + radius)
...
boundingBox NoShape =
NoShape


(Although a proper practitioner of (or not) would realize quickly that I have not allowed for failure in my type for points, and I should be typing something like Point x y (to allow for NoPoint).) It goes without saying that there is, in fact, a better way to handle this. It's a particularly nice way, though, so I'd like to discuss it a bit. The proper way to do this is to first decide that perhaps it would be nice to know when failure is going to happen, and that second once we've formalized failure in that manner, it might be possible to handle it more convientently.


There are any number of ways to formalize failure. I'm going to use the standard prelude's Maybe type:


data Maybe a = Just a | Nothing


which would seem to be the same thing as (or not) except that whereas (or not) is implicit, Maybe is explicit and thus it is possible to have functions like boundingBox which are, in fact, complete functions and do not have to worry about the failing case. Further, functions which can fail, such as a hypothetical


parseShape :: String -> Maybe Shape


not only indicates that it fails, but forces the user to actually do something about failure instead of helpfully passing it on for some other function some day to discover and perhaps maybe do something about except probably it will just pass it on and the only way you'll discover that you didn't parse a shape is when you get to the other end and find that in fact the best strategy for getting from here to there is NoStrategy.


Of course, introducing Maybe has its own downside, which often looks like this:


case parseShape str1 of
Nothing -> Nothing
Just s1 ->
case parseShape str2 of
Nothing -> Nothing
Just s2 -> boundingBox s1 s2

but even this is, in fact, abstractable, and can be reduced to


do s1 <- parseShape str1
s2 <- parseShape str2
return (boundingBox s1 s2)

via monads. I would tell you about monads but everyone else has already written a blog post about that and I don't feel like it right now. In this particular case, it's particularly easy because all the necessary plumbing is built right into the standard library.


Of course, a form of this generalization is, in fact, possible for the (or not) pattern. It's not quite as nice as the Maybe approach, what with not using the type system and not actually guaranteeing that failure is handled, but as long as we're relying on the programmer to include failure cases, we might as well make it easy.


First, we abstract the notion of failure. We'll do that with a type class:


class Fuckable f
where fuck :: f
isFucked :: f -> Bool

Then, we can provide functions to automagically handle failure for functions that wouldn't otherwise:


liftF1 :: (Fuckable f, Fuckable g)
=> (f -> g) -> f -> g
liftF1 f x
| isFucked x = fuck
| otherwise = f x

liftF2 :: (Fuckable f, Fuckable g, Fuckable g)
=> (f -> g ->h) -> f -> g -> h
liftF2 f x y
| isFucked x || isFucked y = fuck
| otherwise = f x y

...

and now, we can omit the last case from the earlier boundingBox function, instead just writing


boundingBox = liftF2 boundingBox'
where boundingBox' = ...

and I'm sure we can all agree that is better than anything in any "standard prelude" you or I or anyone else might be interested in snarkily bringing up as containing "better" ways to do things.