Monday, December 04, 2006

A little knowledge gets you a long way straight to Hell

or, but what if it were a snoc-list, being functional anti-pattern the third.

A long time ago, when I was a freshman in college, I took a course on functional programming.  There was a problem on the first exam involving various list operations and drawing pictures of the resulting structures.  I had put no effort into learning the actual underpinnings of the list operations and was quite frustrated by anyone actually having the temerity to test me on such irrelevancies; however, as a direct consequence, I have remembered the exact interaction of conses and set-cdr!'s ever since.  At least one of my coworkers never had such an experience, which led to the
following comment:

{- but would this be faster?  I'm not sure how init and last are implemented:
removeTrailingSpace s =
  if last s == ' ' || last s == '\n'
  then removeTrailingSpace (init s)
  else s
-}

to answer the question: no, there is no conceivable Haskell implementation of cons lists in which that implementation could be faster than anything.[1]

However, that (commented) code snippet was not my real point.  My real point comes later in the same file, when the intrepid programmer sets off to demonstrate that, in fact, he did understand that tail recursion is a good thing:

getOneToken' :: Token -> String -> (Token, String)
getOneToken' partialToken "" = (partialToken, "")
getOneToken' partialToken (c:cs)
 = if ((c == '\n') || (c == ' '))
   then (partialToken, cs)
   else (getOneToken' (partialToken ++ [c]) cn)

So.

My point here is not that he's doing space delimited parsing, which is a stupid way to parse data files and leads to data file that contain lines like

Rectangle ( 1 , 2 ) ( 3 , 4 )

spaces not optional.  Nor is my point that if he were committed to doing that, he could have used the words function in the standard library, which breaks a line of text apart on the spaces (and which goes nicely together with the lines function, which splits a block of text apart on newlines).  Nor is my point that if he wanted the exact functionality he provided, he could have written

getOneToken' = break isSpace

again, using nothing by functions from the standard library.  No, my point is that in an attempt to avoid the linear space that would be used by simply consing each token character onto the token returned by the recursive call, he produced code with repeated appends that take quadratic time.  Quadratic time, you say?  Why, yes. Cons-list construction is right-biased, which means that append takes time linear in its left argument.  In this case, with a token of length n, that works out to be:

0 + 1 + .. + n

which, as we all learned in intro CS, is on the order of n^2.

The point?  This is bad code.  This is not just bad code because it's unnecessary, it's bad code because it attempts to be monstrously clever and ends up being monstrously unclever.  Dealing with such monstrous clevernesses turned out to be a lot of why I don't work at Aetion any more.

[1] One could hypothesize a scenario in which, first of all, one detects that the use of a list is linear (and that thus you can afford to mutate it in place) and that one also keeps a pointer to the tail of the list as well as the head, at which point one would be able to implement init and last in such a way that they took constant time.  I am not aware of any language/compiler that can do even half of that.