Thursday, February 22, 2007

Concentric Octagons

or, Sweet Jesus Why!?, being functional anti-pattern the seventh.

This anti-pattern will be presented in the form of a problem, with progressively better solutions until I arrive at a rough approximation of the original code.

The problem

You are provided with a source figure and a destination figure; both of these are arbitrary polygons. Your goal is to return the heading from the center of mass of the source polygon to the center of mass of the destination polygon.

A polygon is represented as a counter-clockwise list of points. The polygon is not guaranteed to be convex. To make your life easier, a function

centerOfMass :: Polygon -> Point

is provided. The centerOfMass function does not return (or not). You also have a function

polygonIntersections :: Polygon -> Polygon -> [Point]

which finds the (possibly empty) set of points at which two polygons intersect.

The First Solution

heading source dest = atan2 (y' - y) (x' - x)
where (x,y) = centerOfMass source
(x',y') = centerOfMass dest

Unfortunately, this solution is too simple and takes too few lines to be seriously considered.


The Second Solution


From high school geometry, you may remember that the segment between the two points of intersection of two circles is perpendicular to the segment between their centers. We can take advantage of this to find the segment between their centers: first, we can use the centerOfMass function to find the centers of the polygons and then compute the distance r between the centers. Then, we can construct two circles, concentric with the polygons and each having radius r. Finally, we can find the points of intersection, compute the slope of the segment between them, compute the heading based on that slope, and then rotate it 90 degrees (since we want the heading from center to center instead of from intersection point to intersection point).


Unfortunately, there are several difficulties with this code. First, our library of routines does not include routines for manipulating circles, and while it may seem somewhat appealing to represent a circle as an infinite list of points, manipulating that list is likely to be difficult. A second difficulty is that the circles used are larger than they need to be. We will address both these problems in the next solutions.


The Third Solution


First, we will address the problem with representing circles. As I mentioned, we do not have either a representation for circles, or routines manipulating them. However, we do have code to manipulate polygons, so we can appoximate them. Obviously, the more sides a polygon has, the better approximation it will provide; for our purposes, eight should be fine. While this will introduce some deviation in the result, it will surely not be too much. We now have the function


octagonFromCenterAndRadius :: Point -> Double -> [Point]


whose implementation we leave to the reader.


Sadly, we have not yet found a use for infinite lists. Luckily, we have one more problem to solve.


The Fourth Solution


As previously mentioned, our solutions so far have relied on computing the distance between the centers of mass of the two input polygons, and creating circles (or octagons) of that size. However, this creates shapes that are larger than we precisely need. Instead, we can take advantages of infinite lists and filtering to ease this problem:


octagonsFromCenter c = map f [1..]
where f = octagonFromCenterAndRadius c


Now, we are better equipped to find the intersection points to use in the process from Solution 2:

intersectionPoints p1 p2 =
take 2 $ head $ filter (not . null) points
where c1 = centerOfMass p1
c2 = centerOfMass p2
points = zipWith
polygonIntersections
(octagonsFromCenter c1)
(octagonsFromCenter c2)

And, at this point, I believe we can be happy with our solution. Note that we no longer have to find the distance between the centers of mass, and have thus saved outself a square root operation.


A Confession


I have made up this derivation of the final solution. In fact, I was not around when the original solution was divised; I only discovered this algorithm while attempting to understand what happened in a module I was supposed to be porting to a more modern version of our libraries. However, none of the fundamental points of the algorithm have changed: the polygonIntersections and centerOfMass functions were both used in the original solution and they were used to generate infinite lists of octagons such that the intersection points of the first two that intersected could be used to find a heading between the centers of mass.


I suppose there are any number of lessons you could draw from this code. I've always seen it as an example of two things: first, that sometimes it's not worth looking for a more interesting solution, but that second, if you must find a more interesting solution, at least it will be amusing to future generations.

"No, it's purely functional"

or, I will have none of your State monad, being functional anti-pattern the sixth.

Shortly after arriving at my previous employer, I was talking with oen of the senior developers, and some piece of state passing code came up. Having recently discovered the exciting world of the Monad Transformer Library, I said, eagerly, "Oh, are you using the State monad for that?" To which he replied, "No, the code is all purely functional," and seemed quite proud of himself. At the time, I was quite confused. Later, I became more confused, when it became apparent that every function in his code started:

function :: AppEngine -> ... -> STM stuff
function engine ... =

Of course, he rarely actually wanted the AppEngine. Frequently he actually wanted the DataStore, which lived inside a GenericEngine, inside the AppEngine. So, each function contained the line

where dataStore = store (genericEngine engine)

By this time, readers will presumably have wondered what is done with an AppEngine or its contents, since they seem to go in but not come out. However, as luck would have it, the contents of AppEngines et. al. were all TVars, so they could be updated using writeTVar, without having to worry about polluting your code with icky, icky monads.

I was more perplexed. However, as annoying as passing the engine as the first parameter to everything was, it was at least fairly obvious, even without being too familiar with the code in question.

On the other hand, a later example of this behavior I ran across was somewhat harder to decode. In this case, I was translating a set of generic action descriptions into situation specific information. The meat of the code had been written elsewhere, and I was given some fairly simple tools: an ActionSegment data structure, and a function:

makeSpecific :: Point -> ActionSegment
-> ActionSegment
makeSpecific startingPt segment = ...

Lest this seem somewhat confusing, both generic and specific actions were represented as lists of action segments. The difference was that the generic ones had fields with relative values, and the makeSpecific function ferretted out those values and replaced them with non-relative variations. The starting point argument also confused me a bit, since each ActionSegment included a starting point already, in non-relative form. However, after scouting around a bit for a use of startingPt, I noticed that the Point type implemented the (or not) pattern, so I happily passed NoPoint and ran it to find out what happened. And, as far as I could tell, it worked wonderfully. Generic fields became specific, I could generate the information I needed from there, and so on.

As it happened, though, subtle bugs started appearing. I eventually traced them back to the startingPt field. It turned out that when I got a list of ActionSegments, only the first had a valid starting point field. The starting points of the remaining ActionSegments had to be manually initialized to the computing finishing points of the previous segments, using the argument to makeSpecific. (As it happened, it was also advisable to pull the valid starting point out of the first segment and pass it to makeSpecific, since said function would happily overwrite the valid value in the first segment.) To finish complicating things, ActionSegments could contain sub-segments, so to speak, which had similarly fascinating starting and ending point relationships with their parents.

Some time later, I was talking to the original developer, and mentioned casually having discovered that feature late one night. Oh, yeah, he explained, well, the finishing point was in the result already so he didn't want to waste memory by returning it again.

Monday, February 19, 2007

A Snowball's Chance

or, playing to your weaknesses, being functional anti-patterns the fourth and fifth.

While I was somewhat occupied collecting the code samples that made up functional anti-patterns the first through third, I was being paid primarily to develop, extend and maintain a package I'll call Snowball[1]. Snowball was a somewhat grand undertaking, mainly consisting of everything we needed and would hope was in the Haskell standard libraries, but wasn't; eventually, it grew to encompass a mutable-graph-inspired storage system, a COM-like object system (masquerading as CLOS-like object system), an application level threading scheme, some rough ideas about concurrency, a monad-comprehension-inspired approach to event driven programming, roughly twelve months of my professional effort, and probably at least one or two more things. It was also, at least from my perspective, a failure. The entire Functional Anti-Patterns series was inspired by my repeated attempts to figure out exactly what went wrong when I was writing Snowball, and this is one more attempt in that direction.

Solving the Wrong Problem

The largest factor contributing to Snowball's eventual failure was its existence in the first place. Snowball was the result of my dissatisfaction with roughly every design meeting I ever attended at Aetion. After several multi-hour discussions which served only to conclude that we ought to have more discussions in the future, perhaps having first agreed on what to discuss (and yes, it is possible for a company with nine employees to schedule a meeting to figure out what to discuss at the next meeting), I decided that it would be easier for me to find technological solutions than human solutions. So, when faced with a design document which had clearly been written by someone who had never adjust to no longer writing in Modula-3, I chose to implement an object system rather than attempt to persuade him to find a more functional statement of his goals; when faced with a threading scheme that would have spent more time computing dynamic thread priorities than doing actual work, I implemented application level threading so the hooks would be available if I was ever forced to actually implement that scheduling algorithm, and so on, and so forth.

From one perspective, this worked surprisingly well; most of the things I built actually worked (at least as well as we had any project that required them), and it allowed me to spend my time at design meetings saying much more "yes, we can do that" or "yes, I've done that" than "no, I think we need to take a different approach." However, in the long run, it resulted in having a product which, rather than supporting a clear way to accomplish anything, supported four or five ways, none of which had much chance of actually being the best for the job at hand. Further, from my own perspective, it ended up contributing substantially to my own dissatisfaction. When I started working on Snowball, I had a simple (sounding) goal - design a framework for easy, relation-oriented programming in Haskell - and I was fairly excited about working on it. By twelve months later, not only could I not have articulated any single thing Snowball did really well, I was no longer particular excited about any of the things it did poorly.

Solving the Wrong Problem (II)

That isn't to say that certain technological patterns didn't emerge while I was working at Aetion, some more successful than others. I found that the most common way to produce a successful block of code was to start from the question of "what happens here," and the most common way to produce an unsuccessful block of code was to start from the question of "what data type do I end up with here." This should not be tremendously surprising; almost every justification of functional programming to non-functional programmers talks about the advantage of reasoning about functions as first class objects. However, I found it surprisingly easy to slip into data-oriented programming anyway.

The first way I slid into this was by over-focusing on the Domain Specific part of the phrase Domain Specific Embedded Language. I had always conceived of Snowball as being a DSEL, and from that start, I ended up designing the domain language with little consideration of Haskell's underlying advantages and disadvantages. The specifications I was given were full of bullet lists of data types and their attributes; the effects of these attributes and propagation of changes were described secondarily, and frequently with either little or contradictory detail. From this, I designed a domain language in which objects and their attributes were primary. However, this turned out to be a somewhat uncomfortable fit for Haskell, and a very uncomfortable fit for Haskell-loving programmers. The more I worked with my objects, the more I wished they weren't so committed to being objects in the first place.

The second way I found to slide into data-oriented programming was over-generalization. From early on, I had the idea that Snowball-based code had to integrate with other Snowball-based code, ideally seamlessly and without either integrated piece of code knowing that the other existed. Unfortunately, this meant that the domain and codomain of any Snowdrift-based function had to be sufficiently general to be plugged together with any other such function in any direction. This led to the type:

Object o => o -> o

For some class Object. This forces the discussion back to operations on the object class, since they become the only things available to any function working on them.

Object, Revisited

That's not to say I don't think the only problem with my object class was that object-oriented programming is not always the best approach. In fact, one simple change would have made it both more useful, more supportive of functional-style programming, and a lot easier to use.

When I first started designing the class, I was thinking in terms of my limited experience with Swindle, which attempts to add CLOS to Scheme, rather than my limited experience with COM. While Swindle adds notions of classes, inheritance, and so forth to Scheme, the fundamental operations with which a programmer (read, me) interacts were (roughly) read-slot and write-slot. My object class worked roughly the same way:


class Object o
where readSlot :: Slot t -> o -> Maybe t
writeSlot :: Slot t -> t -> o -> Maybe o

Which all seemed fairly obvious to me. The key fact I missed was that it's rare for a block of code to read or write a single slot within an object; rather, functions usually manipulate related collections of slots. One might even call those slots "interfaces," and then one would be well on the way to realizing that I had implemented all the wrong operations. The second key fact I missed was that read and write are fairly imperative operations; there's no obvious way to string together reads of multiple slots, computations with the read values, and then writes of multiple slots with a composition operator. Equipped with those two realizations, one might come up with the following alternative implementation:


class Object o
where queryInterface :: Interface i ->
(i -> r) ->
o -> Maybe r
modifyInterface :: Interface i ->
(i -> i) ->
o -> Maybe o

This would have more than one advantage over the previous approach. First, entire computations can be created without worrying about the object infrastructure and used (and composed) later. Second, operations should fit into one query or modification; unlike the previous implementation, if I found myself needing queries or mutations of multiple interfaces within a single module, it would indicate a problem in my interface abstractions rather than just an uncomfortable necessity of writing code.


In fact, when our summer intern and I set to recovering a massive, 1000-line, intertwined block of code that was responsible for generating and maintaining aggregate statistics from a data flow, we ended up inventing roughly that structure (although with more detailed variations on mutation), and had little use for the underlying readSlot/writeSlot mechanism.


So, my first major anti-pattern: programming towards areas a language is (intentionally or unintentionally) weak. In this case, had I been worried about functions instead of objects from the start, I could have saved myself a lot of trouble.


All Things to All People


My second major mistake arose from Snowball's place as the catch-all underlying platform thing for the remainder of code that Aetion wrote. It resulted in Snowball containing a fair amount of unrelated code, which in turn frequently suggested parallels that didn't really exist. My favorite example of this is the interaction between the aforeburied object model and the event loop. I had known from early on that one component of how Aetion thought about their product was an event loop that would respond to changes in the underlying data; since I was already designing a generic structure for data models, it seemed natural to integrate it with the event loop. Most of the interesting changes at that point were happening at the level of single slots rather than interfaces, which became another strong push in the direction of the readSlot/writeSlot interface. A call to writeSlot could naturally trigger the appropriate ModifySlot event, which could then find the pieces of code which cared about that modification, and so forth.


This turned out to be a bad idea for two reasons: first, because it supported a broken idea of objects. Second, however, it encouraged events to exist at a much lower level than would later have been helpful. Eventually, this became a source of bugs: sometimes, while improvements or extensions to existing code would not semantically change earlier code, it would change the events thrown by the previous code, either in order (which rarely mattered) or by throwing new events or stopping previous thrown events from happening (which almost always caused bugs).


The real problem, I believe, was that the event-driven stuff and the data-store stuff were not, in any meaningful way, related. However, by their existence in the same block of code, they cause early interactions where none should have existed. I did eventually change the notion of events to be more general; however, by that time, a fair amount of code (including our entire client-server interface) were based on the original messages.


Happy and Free


Once it had become obvious to everyone that my time at Aetion was limited, the partners suggested to me that perhaps Snowball could become open-source. This forced me to think about the code more than I had in a while: how much work would it require to make it generally useful? How much to make it accessible to developers not familiar with Aetion's history and goals? Did I want to be associated with the code?


Eventually, I recommended against open-sourcing Snowball; at that time, I believed that the work that would be required before it would be generally useful, interesting, or something I would take credit for would outweigh the work involved in simply starting from scratch. If anything, I have become more convinced of that since that time. However, as embarrassed as I would have been for much of the Haskell community to see the code at that point, it was still somewhat hard for me to admit the state of the code to myself.


I'm sure that Aetion is still using Snowball, although I honestly suspect they may not for much longer. Almost no one left at the company now seemed to share my interest in Haskell, or in having a common base for their applications. Given the design decisions that had been made before I arrived and Aetion's abysmal decision-making process, I don't think I would have stayed any longer or been much happier had I made many fewer mistakes in Snowball's design and evolution; it was this realization that led to me to quit rather than keep trying to contribute to their software. And, on the whole, I have to say that I probably learned more about Haskell from confronting Snowball's various problems than I have before or since (although it has taken me some time to fully realize some of the things I should have learned).


Just, uh, I wouldn't recommend learning them the same way.




[1] Original credit for this name goes to Mark Goldman.