Daniel's Blog

Juxtaposition is an interesting operator. I’m aware of no language in which this operator can be redefined by the user. I’m referring to simple placement of two tokens next to each other, such as “x y.”

In the vast majority of languages, this operation has no meaning at all. For example, in Python:

>>> 2 + 3
5
>>> 2 3
  File "<stdin>", line 1
    2 3
      ^
SyntaxError: invalid syntax

In a few languages, strings juxtaposed are conjoined. In C this is limited to string constants (a kind of constant folding optimization).

Juxtaposition has a great deal of importance in functional programming languages. One could even argue that the manner in which it is interpreted is the principal differentiator between the three primary families of FP language. Perhaps use of this operator adds significantly to the general sense of FP being difficult. If so, it would be an example of something completely invisible to the FP community blocking adoption.

For the sake of discussion, I’m going to define the three principal orders of functional programming as applicative, concatenative, and array.

Applicative FP

The applicative order is the best known of FP. It contains the Lisp and ML families. In these languages, juxtaposed values are applied to functions on their left. For example, in Scheme or Lisp:

(+ 1 2 3)
6

(* 2 2 (+ 0 3 4))
28

What’s going on here is that the leftmost juxtaposed value at the same precedence level (meaning inside parentheses in the Lisp family) must necessarily name a function, and the remaining values at that level are arguments to that function. So + and _*_ above name functions and the digits are arguments. There’s no restriction that subsequent arguments must not be functions or that the leftmost argument not be, for example, the result of an application. The same basic rules apply in the ML family (assuming the functions named below worked as above):

sum 1 2 3
6

prod 2 2 (sum 0 3 4)
28

In applicative FP, functions are taken to exist which take at least one argument, and may take an arbitrary number, and juxtaposition is applying arguments to functions.

Concatenative FP

In concatenative FP, every function has the same API: take a stack and return a stack. Juxtaposition explicitly is thought of as concatenating two complete programs. The main family under concatenative FP is the Forth-like languages. So, to compute the sum of squares, one could do something like this:

2 3 dup * swap dup * +

This would leave 13 on the stack. What happened here, in terms of stack effects is as follows:

An interesting thing about this is that literals are not merely literals, they also push themselves onto the stack.

The winning property of concatenative languages (and the source of the name “Factor”) is that any sequence of juxtaposed words can be extracted (“factored”) into its own word. To wit:

: sum-of-squares dup * swap dup * + ;
4 5 sum-of-squares .
41  ok

: square dup * ;
: sum-of-squares square swap square + ;
3 5 sum-of-squares . 
34  ok

: times-swap * swap ;
: times-plus * + ;
: sum-of-squares dup times-swap dup times-plus ;
4 4 sum-of-squares .
32  ok

Any run of words can be extracted in this fashion, because any run of words truly refers to a complete program which is performed on an input stack and producing an output stack. So every Forth program can be thought of as a pipeline acting on the same root data structure.

Array FP

Array languages provide functions with exactly two arities: 1 or 2 arguments. In general, these are referred to as monads and dyads respectively, but I have been referring to them as unary and binary operators, because this is the conventional name for such a thing.

Interestingly, juxtaposition in APL is how one builds an array of values. Prior to J, juxtaposition of operators had no meaning. In J, value juxtaposition means the same thing as before but operator juxtaposition means something quite novel, which they call “verb trains,” which are interpreted as either “forks” or “hooks” depending on whether or not the “train” is even or odd in length. This is reminiscent of the phenomenon of verb serialization in natural language, in which speakers of certain languages can emit a list of verbs to imply that all took place one-by-one in some order.

A hook is an even number of juxtaposed functions, and produces the following structure:

(f g) x becomes x f (g x)

A fork is an odd number of juxtaposed functions and produces the following structure:

(f g h) x becomes (f x) g (h x)

It isn’t immediately apparent, but what’s happening here is juxtaposition is producing a binary tree of operators. I’d like to go into more detail and produce better examples, but this functionality is a bit over my head still. I’m sure with practice this becomes easy, but I’m quite new.

Juxtaposition and Arity

It’s interesting to me that these three families of FP languages have three different notions of “function” and also three different interpretations of juxtaposition. For example, in the applicative order, functions are generally regarded as taking any number of formal parameters. In the ML family, functions have a minimum arity of 1 (less than that and you have a value) and a maximum arity of 1 as well, with additional parameters faked either by tuples (more common in Standard ML) or by returning functions of decreasing arity. For example, in Haskell a function with type Int -> Int -> Int can be treated either as a function taking two integers or a function taking one integer that returns another function that takes one integer and returns an integer. Here’s a brief summary:

Order	Family	Arity		Juxtaposition
Order	Family	Min	Max	Juxtaposition
Applicative	Lisp	0	N	Apply items 1–N to function at item 0
Applicative	ML	1	N	Apply items 1–N to function at item 0
Concatenative	Forth	1	1	Evaluate items left to right on the same stack
Array	J	1	2	Build a binary tree of operators and evaluate it with arguments on right or left and right

Reflections on OOP

In all conventional procedural and OO languages, functions are defined and called by passing parameters in a comma separated list, as in:

def foo(a, b, c):
  return a + b * c

There is an interesting analogous question in languages that admit side effects: what to do when a function of arity 0 is invoked? Consider a function to return a random number which requires no parameters, say it’s called random. Do you call random by saying random or random() or something else?

The decision comes down to a question of whether or not you consider a function to be an implementation detail or not. Bertrand Meyer formulated the universal access principle which states that the client of an interface should not be able to distinguish between a simple property access and a function call, because the implementation should be able to choose the strategy that works best for it without breaking compatibility. In other words, whether a property of some object is computed on-demand or stored in a variable is an implementation detail. As a result, “getters” in Eiffel and Ruby do not require parentheses on their invocation.

Python and most C-derived languages have adopted the opposite strategy which lacks a sexy name. In Python, the difference between foo and foo() is that the former refers to the function foo as a value and foo() is really a request to evaluate the function named foo and produce the result. The distinction is not meaningless. The Python library makes much less use of anonymous functions than Ruby’s, for example, but coding a higher-order function involves fewer moving parts.

Smalltalk and Ruby still permit the passing of functions as first-class values through a sort of reified function enclosure called a block. Smalltalk and Ruby do not permit direct variable access to objects, so all getters are really function invocations, but object foo in Smalltalk or object.foo in Ruby cannot be used to refer to the methods named foo on object. Both get around it in part by having a method Object#send which takes a method name (“selector”) as an argument for the specific case of wanting to call a method on another object as well as the block functionality. A block in Smalltalk looks like [ :param1 :param2 | code goes here ] and furnishes an object which, when value is called on it, returns the result of evaluating the block. Additional methods on the block permit passing of formal parameters.

This looks like a very similar tradeoff, but it makes somewhat more sense in that it is a property of the language with two alternatives rather than three.

November 17, 2011

Inspired by James Hague and waiting for a new release of Cuis I thought I might sit down and noodle with J. See if I get any further than I did last time.

So, I recently wrote a trivial Markov chain algorithm for Smalltalk. The code is pretty short, so I’ve included it at the bottom of the post. Rub your eyes on it after we talk about the J code.

I want to implement this algorithm in J, but before we can do that let’s learn how to do something a bit simpler. Let’s just count up letter occurrences. I’m inclined to see that as being related, but let’s face it, even that intuition is probably wrong with J.

First, let me show you my solution.

letter_frequency =: =((~. @: ;/) ,. (;/ @ (+/ @ |: @ =))) &: sort

Yeah… alright…

NB.
   letter_frequency 'Mississippi'
┌─┬─┐
│M│1│
├─┼─┤
│i│4│
├─┼─┤
│p│2│
├─┼─┤
│s│4│
└─┴─┘

So it seems Perl has a competitor!

Verbs

I’ll try to spare you the J tutorial because there’s a few really truly weird things about this piece of code, but let me give you this brief interactive tutorial to show you how I built this up.

NB.
   w =: 'Mississippi'
   sort w
Miiiippssss
   ;/ sort w
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│M│i│i│i│i│p│p│s│s│s│s│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
   ~. ;/ sort w
┌─┬─┬─┬─┐
│M│i│p│s│
└─┴─┴─┴─┘

So, sort does what you’d expect. The syntax here is reminiscent of Polish notation: we’re adding additional stuff to do to the front of the list and they’re taking effect last. So on line one I sort the letters, on line two I box the letters, and on line three I am ’nub’ing (removing duplicates) from the boxed letters.

The functions are called verbs here, and they’re all fundamental operators or functions except for /, which is called an “adverb” and takes a second function as an argument and performs what a Haskell user would call foldr1. So the ;/ is saying, apply ; between each element, and ; boxes its arguments.

NB.
   sort w
Miiiippssss
   = sort w
1 0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1
   |: = sort w
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
   +/ |: = sort w
1 4 2 4

It would be much easier to appeal to what is visually going on here than to explain what’s really happening, so I think I’ll rest my case on that. The part here I had the hardest time figuring out was the = verb. It is defined as “self-classify” and a bunch of examples in the documentation were enough to convince me it was close to what I need. The |: verb pivots the table, and we again use the / adverb to produce a fold, in this case summation. It is visually apparent that what’s happening there is we are summing down the columns to count occurrences. I am quite sure there is a less verbose way to do this without the pivot, but I don’t know what it is yet.

In any case, it should now be clear that we have both of the constituents one would need to show letter frequencies: an array of letters and another array of their occurrences:

NB. 
┌─┬─┬─┬─┐
│M│i│p│s│
└─┴─┴─┴─┘
 1 4 2 4

One can combine these a variety of ways, but the colorfully-named “stitch” is the easiest I’ve found so far.

NB.
   (;/ 2 3 4) ,. (;/ 8 9 0)
┌─┬─┐
│2│8│
├─┼─┤
│3│9│
├─┼─┤
│4│0│
└─┴─┘

At this point, things get a bit weird. Defining functions in J is done either implicitly or explicitly. Implicitly is basically the same as point-free form in Haskell, by building up and composing functions. Explicit function definition relies on a piece of rather grody syntax, and didn’t work for me for one reason or another:

NB.
   letter_frequencies =: 3 : 0
letters =:  ~. ;/ sort y
occurs =: ;/ +/ |: = sort y
letters ,. occurs
)

I’m sure a better J user than me (say, someone who has used it for more than 24 hours) would be able to point out what I did wrong, but I must admit a dislike of the “magic numbers” that seem to appear in J. File input/output is similar; for example:

s =. 1!:1 <'system\packages\misc\jforc.ijs'

Yes, 1!:1 is a single “verb” which means, give me the content of the file named on the right. Not quite as “readable” as +/, eh? Anyway, we can instead define this function implicitly, as I did at the beginning of this article. There’s just one question: what’s this @: and why is it in all of my functions? For example, the following is fairly confusing:

NB.
  |: = sort w
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
   (|: = sort) w
1 1 0 0 1 0 0 0 0 0 0
   |: (= sort) w
1 1 0 0 1 0 0 0 0 0 0
   (|: =) sort w
|domain error
|       (|:=)sort w

Odd. What about @?

NB. Correct way
   (|: @ = @ sort) w
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1

@ is basically unary function composition. So J isn’t like Forth, no problem with that. The really surprising thing isn’t that omitting the @ didn’t accomplish something, it’s that it did accomplish something: something weird and unexpected!

Trains, Forks and Hooks

The surprise is that “juxtaposition” of verbs is much more complex in J than in other languages. It leads to something called a verb train, but the Cliff’s Notes are that:

(f g) y: means y f (g y)
(f g h) y: means (f y) g (h y)

So above, when I wrote (|: = sort) w, J interpreted this as: (|: w) = (sort w), and when I had |: (= sort) w, J interpreted it as |: (w = (sort w)), neither of which is close to being what we want! In fact, what is being calculated is an item-by-item equality comparison between the string w and the sorted version of w, so you can see why we get the result we get:

NB. Ignore the code
   3 11 $ (;/ w) , (;/ sort w) , (;/ (= sort) w)
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│M│i│s│s│i│s│s│i│p│p│i│
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│M│i│i│i│i│p│p│s│s│s│s│
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│1│1│0│0│1│0│0│0│0│0│0│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘

One of the tutorials has a good example of a place where this kind of thing is actually handy that I like:

NB. Sales tax in NM is 6.875%.
   sales_tax =: *&0.06875
   sales_tax 100
6.875
   (+ sales_tax) 100
106.875

The & adverb binds a parameter to a verb, making a binary operator (or a dyad, in J jargon) into a unary operator (or a monad in J jargon). So now it should be clear why the solution has all those @’s in it.

Ultimately, this took me about an hour to write, with heavy documentation consultation. For comparison, the following Smalltalk version took me about two minutes:

letterOccurrences: aString
  | dict occurrences |
  dict ← Dictionary new.
  aString do: [ :char |
    occurrences ← dict at: char ifAbsentPut: [ 0 ].
    dict at: char put: occurrences + 1 ].
  ↑ dict

Unsurprising and unsexy, but workable.

Epilogue

In reading APL - A Glimpse of Heaven it’s not hard to see the affection the author is showing for APL, an ancestor of J. In fact I was able to translate most of his examples to J while reading it.

It would be hard to find two languages that are more different than APL/J and Smalltalk. Surprisingly, similarities are not hard to find. Both languages are highly interactive and a high premium is put on playing with the language on your own with private experiments. On the other hand, there seems to be a lot more in-depth documentation on the APL/J side, in part (I’m guessing) because it is so much weirder and in part because once one reaches a certain level of proficiency and comfort with Smalltalk, one learns from the image rather than a book.

I do have to wonder though. All of the APL and J fanatics talk about the language as a “notation of thought.” It strikes me as a terrible idea to write software in a language so intrinsically unreadable, but maybe it’s true that after time it becomes quite clear. It looks like it has that odd benefit that SQL has, in that code once written probably lives a long life; I didn’t touch on this much, but all of the built-in verbs in J go to great lengths to be meaningful in both monadic and dyadic contexts and operating on different kinds of data, perhaps it works in practice like SQL in that, once the statement is written it rarely needs to be revisited. It certainly could have the benefit of inexpensive functional purity.

My friend Allan Poindexter has tried to explain to me on several occasions how Common Lisp through its macros is more powerful than Smalltalk. There’s a smidgeon of truth to that, though I believe modern Smalltalks have sufficiently expressive metaclass systems that you could achieve anything except raw syntactic extension. But at the same time, my solution to the above problem does not require three pages of English prose to explain it, and it seems obvious that in J this drawback would only get worse.

So it flies in the face one one’s expectations to hear allegations like this one from the Heaven article:

If APL were a specialist, complex language, it would only attract the “Boy Wonders” of IT, those with A-Grades in everything, whose horizons are limited by bits and bytes.

So it is paradoxical that the great majority of IT people have never really understood APL. Those who have used it successfully have very often not been computer-literate, or have only a slight knowledge … and they have frequently learned APL in isolation. That is to say, the language serves anyone prepared to explore it without prejudice.

It’s hard to evaluate this claim on the surface, but it is worth noting that industries in which APL seems to have penetrated are highly mathematical (actuarial work, finance) but not known for having strong, proud traditions of software engineering.

I said fairly recently that when I program Lisp I feel like I’m tricking the machine into doing what I want. J amplifies this feeling. Smalltalk does not. In fact, when I use Smalltalk I frequently feel like I’m stating the obvious. What I don’t like about Smalltalk, really, is the lack of books. I feel frustrated at the size of the library, and unsurprised that I spend more time reading the library than writing code. What’s funny is, I used to make the same complaint about Haskell. J, as terse, ugly, and frustrating as it is, it is also very fun, and seeing a two page reference to the 70ish fundamental verbs of the language definitely gives one the feeling you are dealing with some kind of alchemy of programming rather than a standard library.

So I will echo James Hague’s endorsement. Try it. You may find you like it quite a bit.

Postscript: Smalltalk Markov Chain Code

Note: I suck at Smalltalk. If you have suggestions, please email me.

Object subclass: #MarkovData
  instanceVariableNames: 'chain'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Markov'.

MarkovData>>addFrom: aStream
  "Add all the words in some data source to this Markov chain"
  | current previous |
  previous := #start.
  [ aStream atEnd ]
    whileFalse: [ 
      current := aStream next.
      self addWord: previous precedes: current.
      previous := current ].
  self addWord: previous precedes: #end

MarkovData>>addWord: previousWord precedes: anotherWord
  "Add all the words in this stream to this Markov chain."
  | words |
  words := chain at: previousWord ifAbsentPut: [ Bag new ].
  words add: anotherWord

MarkovData>>addWordsIn: aString
  "Add all the words in the passed string."
  self addFrom: (WordStream on: aString)!

MarkovData>>generateSentence
  "Generates a String sentence from the data we’ve collected so far."
  | wordStream |
  wordStream := String new writeStream.
  self generateSentenceOn: wordStream.
  ^ wordStream contents

MarkovData>>generateSentenceOn: aStream
  "Generates a random utterance on aStream."
  | word |
  word := self generateNextWordFor: #start.
  [ word = #end ]
    whileFalse: [ 
      aStream nextPutAll: word; space.
      word := self generateNextWordFor: word ].
  aStream skip: -1.  "back up a character, to not end on a space"

MarkovData>>generateNextWordFor: aWord
  "Generate the next word based on the given word."
  ^ (chain at: aWord) atRandom.

MarkovData>>at: aWord
  "Return the set of words that follow from this word."
  ^ chain at: aWord ifAbsent: [ Bag new ]

MarkovData>>initialize
  super initialize.
  chain := Dictionary new! !

MarkovData class>>buildFrom: aStringOrStream
  ^ self new addFrom: (WordStream on: aStringOrStream)

November 9, 2011 j

I used to joke that I was the only programmer under 30 who knows SNOBOL. It may not be the case anymore, but there’s something fun about knowing a bunch of obscure languages. For example, a used Prolog book on Amazon goes for $20. A used Smalltalk book goes for about $10. A used SNOBOL book goes for about $1. For comparison, used books on functional programming usually seem to go for about $80. I have a ton of those books, and I haven’t learned as much from any of them as I have from Real World Haskell or ML for the Working Programmer, which are both still in print.

It’s hard to sing the virtues of languages like SNOBOL, but a bit of context helps I think. For one thing, SNOBOL stands at a certain evolutionary dead end. They don’t make languages like that anymore. It’s shamelessly GOTO-laden, has a laughable approach to error handling, and its handling of strings is quite strange by modern standards. But I think it really exemplifies the fact that nobody knew what the hell was going on thirty to forty years ago.

I was reading an old issue of BYTE magazine that I downloaded. It was the August 1981 issue, the issues from my birth month, which happened to be the Smalltalk spectacular issue, with four or five articles about Smalltalk. When you invest in an unpopular language with a long history like Smalltalk, Prolog or Lisp, there’s always this sense looming in the air that you’re wasting your time or that the language is defunct. This notion is absurd; the internet “ensures” that no language can die, the worst they can do is lag behind, and this will be treated like a major problem whether or not it is true.

In fact, what struck me about this issue of the magazine is the gulf between the content of the magazine and the ads alongside that content. The Smalltalk authors are essentially trying to explain what a graphical user interface is to people who have never seen them before, and right next to the screenshots of windows and menus with Copy and Paste in them are ads declaring that this $1700 graphics card will enable you to have character art in 16 colors on your Sinclair. There’s an ad for a text editor for $300. There’s ads for 16K expansion cards, tape drives, build-it-yourself computers from companies you’ve never heard of. It’s chaos and confusion. Smalltalk must have looked like an absurd dream to most of the readers. Something that couldn’t be, adjacent to an ad for a cassette drive or business software with runes in the logo.

Against this backdrop, the sins of SNOBOL are a bit easier to explain. Issues of this and other computer magazines—remember, we’re talking about just 30 years ago, not a hundred—would often contain “source code” for a program, in some kind of assembly language. There were easily half a dozen competing architectures of computer running around, all for similar costs. In this same magazine there are ads for compilers in BASIC, Pascal, Fortran, Forth, and C, all for different combinations of platform. There are ads for third party operating systems, including one I’ve never heard of that implemented a very basic Unix on the Z-80. In this context, the sales pitch for SNOBOL looks like this: it’s portable, it’s for string processing which you’re probably not enjoying in one of the mainstream languages of the day, and it resembles assembler. For that matter, it’s also probably quite fast compared to BASIC, which is probably what you were using before.

It’s a little disingenuous to look at the situation today and say C’s dominance is based on some kind of obvious superiority when Pascal and BASIC were so incredibly prevalent during this time period on personal computers. The honest answer is probably happenstance. So if the world we live in today, computing-wise, is mostly a result of happenstance, then the past ought to be brimming with interesting stuff long forgotten. Unfortunately, I’ve steeped myself in this mythology so thoroughly it’s hard for me to tell what’s appreciated and what’s been thrown away.

One thing I think all programmers should put more effort into is having some patience for learning older stuff. The internet naturally predisposes us to be excited about exciting new technology, not dusty old technology. But there’s a tremendous amount of group-think going on in the world today. Things which don’t have to go together are always seen together, and interesting alternatives require a celebrity programmer or a heavily marketed implementation to get traction. For example, the next version of JavaScript is going to have classes, for no compelling reason. NoSQL databases are taking off, despite the fact there are no credible non-SQL relational databases. That whole fiasco is very much 1960s style network and hierarchical databases recast with web jargon; all the old relational database arguments work against them, yet neither side has the historical knowledge with which to lead the debate.

I see the industry as having entered a sort of river delta in this era. Very little software technology truly dies these days. Mostly, weaker offerings (mSQL, APL, Joy) are absorbed by similar offerings that, for whatever reason, achieve popularity (MySQL, J, Factor). Since so much of this stuff is open source and unsupported commercially, when one maintainer leaves another one steps up. You can program in MUMPS today mostly because a handful of huge medical apps are so it is supported. Ada hasn’t had a new book published in several years, but it will continue to be used in aviation and learned for that reason until something stronger comes along.

Perhaps what’s needed is not to provide enough context that Smalltalk, SNOBOL and so forth can be appreciated by modern readers. It may even be an unhelpful distraction. After all, people have shit to do, that’s why they need to program in the first place. Perhaps it would be more helpful to skip the plea for lenience and dive right in, pretending that the inconveniences are unnoticeable because they usually are in practice. After all, there are a lot of great ideas in Clojure, but I pinch my nose at the Java-isms. Today, these are convenient and helpful; in another decade, will that still be true, or will they be undesirable reminders of Clojure’s origins? Common Lisp, for example, includes a lot of crap in a “file path” that makes no sense on any modern platform, like hostnames and version numbers. It’s really much closer to a primitive URL. This doesn’t stop Lisp from being useful or even its file paths from being useful, it just adds to the general funk. Similarly, Smalltalk doesn’t specify where the stuff in the image is sourced from. Modern users are put off by the inability to conveniently use Git or Mercurial to manage their versions, but this was considered a forward-thinking view and an improvement in usability over file-based programming. At the same time, Ruby is demonstrating that it’s hard to make an open class world work reliably and safely, but Smalltalk has had that figured out for some time.

It’s an interesting world.

November 7, 2011

In my experience, there are two extreme visions of reality embodied in two philosophies of programming language design. The first is Dijkstra’s view, the view embodied by Milner’s guarantee in ML, that once the program is compiled it cannot “go wrong” at runtime. This perspective treasures correctness and asserts that all wonderful things flow from it: reliability, readability, and so forth. Languages supporting this view of reality are primarily Haskell and ML. I’ll call this the welded-shut view.

The second vision is the biological vision, the notion that the system is like an organism that can change and evolve over time. This view is embodied by Lisp, Smalltalk and Erlang. I’ll call this the organic view.

C is not a language I would consider hallmark of either trend. But C manages, in fact, to do a great job at permitting reliable software development. Why is that? I argue that has a lot to do with POSIX and the nature of the C/Unix standard library. In short, any function that can fail returns a status code. If it must also return values, you pass those values into the function as pointers and the function will record the results in there.

This is decidedly unsexy. But it’s a great reminder that anything can fail: memory allocation, file opening, file closing, anything really.

In the organic languages, there is a strong preference for introducing exception mechanisms and handling errors with a separate control flow. This keeps the simple code looking simple. You can still do simple things simply in C, though it gets a bit ugly with passing result arguments to functions, by ignoring the result codes of functions. You can do this, but your program won’t abort at the right time; it will probably continue to work past a malfunction.

There is a strong argument against exceptions, one I rarely see promulgated, which is that it makes life hard on you if you want to actually implement robust software. There’s nothing in the signature of the function to indicate that it may, instead of returning a value, throw you some horrible exception, and here are the types of those exceptions. However, in the C/POSIX world, it is very much part of the method’s signature that certain error codes may be returned.

Java almost went this route with checked exceptions, but it retained unchecked, runtime exceptions, so it’s sort of a lame duck. In the few places where checked exceptions are used, it feels like a hassle, because so many other things that do throw exceptions don’t force you to deal with it, so why should this part?

I find it surprising then that the welded-shut languages usually also provide an exception mechanism. If I want to make a Haskell program do the right thing when a file cannot be opened, I have to use the ioError function to catch a runtime exception that may be thrown. This, when Haskell already has perfectly worthy ways of handling situations where you may get one of two results (Either). If one wants to write truly reliable code in Haskell or even Standard ML, one must discover through reading the documentation which exceptions might be thrown in which circumstances and handle them.

If you go asking around about how one is to achieve reliability in an organic language you’ll get an earful about automated testing. Over the last weekend I wrote a smallish program in Smalltalk using TDD with the Autotest facility. I have to say, I can see that TDD would obviate a lot of the need for type checking and some other categories of problem which it would be hard to trick a type checker into maintaining. The nice thing about Haskell’s type checker, though, is that it insulates me from having to do so.

I find I gravitate more towards the welded-shut languages, but I find it interesting that neither they nor the organic languages really have found a place in industry. Most of the world uses languages like C and Java, which furnish you with only the most rudimentary type checking, which do not really allow things that have been defined during compilation to change, but which do allow dynamic loading of 3rd-party code. I wonder why this is. Traditionally, systems like Smalltalk and Lisp have had, alongside with their “liveness,” a tacit expectation that if anything were to go wrong, the user is right there and can handle a condition or an exception manually. I suspect this has something to do with their unpopularity. Erlang doesn’t seem to have this weakness.

Another thing you probably wouldn’t expect is that, while I’m unaware of a method to dynamically load code into a running Haskell or ML process, there are frequently few good ways to interface organic languages with the outside world. Lisp and Smalltalk are about the only image-based languages around, they try very hard to be more than just a language and actually a whole environment. Smalltalk really takes this to the extreme, but both have evolved pretty strong walls to keep the world out. Haskell and ML are definitely not image-based, programs written in these languages interface with the OS like everything else, without creating their own walls.

It seems like there are more than a few oddities here. I would expect organic systems to be better at interfacing with the world, and I would expect ML and Haskell to avoid exception system. Instead we find that generally isn’t the case. Also interestingly, Erlang manages to achieve high reliability by accepting that processes fail and allowing it. This seems antithetical to the welded-shut approach of proving that nothing will go wrong. And indeed, highly reliable systems are written in Erlang.

Would a language like ML or Haskell, but without exception handling, be interesting? Would it improve reliability?
Why is reliability not correlated with strong, static typing?
Is there a reason organic systems are not written in organic languages? Or is this an artifact of rarity of use?

October 31, 2011

To assist with job interviews at the NRAO we recently wrote a small “contest” program. Without giving away the details, the crux of the problem is to read a file with a list of scans and calculate the amount of time it takes to move the dishes and perform the scans, and report it. Candidates are required to write this in Java, but that restriction does not apply to me, so I of course had to write it in three languages that are not Java: Haskell, Common Lisp and Smalltalk.

I expect to learn something, or at least reinforce an existing fact, by doing these sorts of language games, but this time I really wasn’t expecting what I found. I discovered that my enjoyment of Haskell is pretty much independent of Haskell’s appropriateness or usability. In fact, it is more attributable to the way I feel when using it, in contrast to everything else. Let me show you a piece of code:

-- | Perform a scan and produce the scan log for it.
performScan ∷ Scan → AntennaSimulator ScanLog
performScan scan = wrapDuration $ do
  slewTime     ← moveTo  $ scanPosition scan  -- move to the scan's position
  onSourceTime ← waitFor $ scanLength   scan  -- wait for the scan to finish
  return $ Log scan onSourceTime slewTime

Somehow, the other languages I’ve used to implement this problem never arrived at a piece of code quite this beautiful. Compare to the Smalltalk:

Scan ≫ runWith: anAntenna
runWith: anAntenna
  "Performs this scan on the supplied antenna."
  | onSourceTime slewTime |
  slewTime     := anAntenna moveTo: self position; lastActionDuration.
  onSourceTime := anAntenna wait: self duration; lastActionDuration.
  ^ ScanResult withScan: self slewTime: slewTime onSourceTime: onSourceTime

And the Lisp:

(defun run-scan (antenna scan)
  "Perform a scan on this antenna. Return a scan log."
  (let* ((slew-time       (move-to antenna (scan-position scan)))
     (on-source-time  (delay antenna (scan-length scan)))
     (total-time      (+ on-source-time slew-time)))
    (with-slots (last-duration) antenna
      (setf last-duration total-time)
      (make-scan-log scan slew-time on-source-time))))

Strangely, despite the similarity of the Smalltalk, I still find the Haskell shorter and clearer. It’s also more modular, though it doesn’t look like it. The wrapDuration command arranges for the next set of actions to count as one as far as the lastActionDuration is concerned, but does not interfere with usages of lastActionDuration within the wrapped action. I found this notion essentially impossible to port to the other languages. The Haskell version really feels more like a composable set of actions for driving the antenna, whereas the Smalltalk version never really stopped feeling like an arithmetic fabrication, and the abstraction leaked a lot in the Lisp version

The Lisp code is just atrocious. I was never really able to put my finger on what it is I dislike about Lisp before. It’s this: while Lisp code is often clever and interesting, it also always feels like I’m tricking Lisp into doing what I want, rather than simply expressing what I want.

Neither Lisp nor Smalltalk really lend themselves to my style. I realize now, that my style is to try to break everything down into the smallest piece, a piece that doesn’t look like work. Haskell encourages this style with the where syntax, which encourages you to say, X is really Y + Z, with Y being this and Z being that. This kind of thing:

-- | To calculate the time to move between two positions, we take
-- whichever is larger of the time to turn and the time to change
-- elevation.
timeToMoveTo ∷ Position → Position → Time
timeToMoveTo (sourceAz, sourceEl) (destAz, destEl) = 
  max rotationTime ascensionTime
    where
      rotationTime, ascensionTime ∷ Time
      rotationTime  = timeToRotate sourceAz destAz
      ascensionTime = timeToAscend sourceEl destEl

This almost doesn’t look like code to me. I would expect any programmer to be able to at least get some sense of what’s going on here, even without knowing Haskell. I don’t think I could say the same thing about this piece of Lisp:

(defun time-to-move (from-pos to-pos)
  "Calculates the total time to move between two positions, assuming
rotation and ascension can occur simultaneously."
  (with-accessors ((from-az azimuth) (from-el elevation)) from-pos
    (with-accessors ((to-az azimuth) (to-el elevation)) to-pos
      (max (time-to-rotate from-az to-az) (time-to-ascend from-el to-el)))))

Lispers are fond of talking about the programmability of the syntax and how easy it is to learn. But the syntax is still there, it’s just encoded positionally rather than with auxiliary words and symbols. For example, the words azimuth and elevation in the Lisp code denote accessor functions. The outer with-accessors macro does something akin to this:

timeToMoveFrom: from_pos to: to_pos
  | from_az from_el |
  from_az := from_pos azimuth.
  from_el := from_pos elevation.
  ...

So the Lisp is wonderfully terse, but I still feel like I’m tricking something into doing what I want rather than expressing it. I think it’s possible to write clearer Lisp code, I just don’t know how, and I’ve relied on Haskell long enough now that Haskell’s version of clarity is becoming my own version of clarity.

I think this is part of what makes my love for Haskell irrational. I really can’t with a straight face tell you Haskell is a better language than every other language I know. I can’t imagine anything more frustrating than trying to teach Haskell to pragmatic people who can already program. And my style grows ever more idiosyncratic as I use it more. (I consider most uses of monad transformers a failure of functional decomposition). I never lose sleep wondering if my code is going to break in strange ways in Haskell, even though it would, albeit probably less often.

Another thing which surprised me about this little experiment was discovering the degree to which I am dependent on the compiler or interpreter to program effectively. Despite idolizing Dijkstra and befriending John Shipman and Al Stavely, I’m lousy at analyzing my code before running it. This problem affects me in every language I use, but I think I am worse because I simply have no respect for languages other than Haskell. Haskell will catch me trying to do really absurd things much earlier in the process of my not understanding what I’m doing. With Lisp and Smalltalk, I often managed to write ten or twenty lines of code, code that parsed successfully, before realizing I was doing something completely stupid, designing myself into a corner or worse. I’m sure I could write elegant code with less waste in these languages, but Haskell really forces me to.

A good friend once said, you don’t learn a foreign language for the sake of knowing it, you learn it to read the poetry. And we live in a world in which people meticulously study Hebrew, Greek and Arabic just for the sake of reading some of the best poetry we have. It would be absurd for me to claim that Haskell is always shorter, always more readable, or whatever; there are certain things I just don’t like to go near when I’m using it (like web development). Is this not similar to preferring Biblical Hebrew for religious poetry over humor?

A lot of people use and encourage the teaching of languages like Java and C for their simplicity. My wife endured this in C, and I got to see first-hand that notions like function calling, sequential processing, and boolean logic are not intuitive. If I reach back far enough, they weren’t intuitive to me either, I just crossed those bridges so long ago I seldom think about what it was like before, but I have enough of a glimpse that when obscure languages like Smalltalk and Haskell are discarded from the realm of possibility for teaching beginners, I find it upsetting. I didn’t know anything about category theory before I learned Haskell, and I still know about that much, yet I am able to benefit from writing terse, correct programs in it, programs that I can reason about the performance of. Granted, my understanding of Haskell’s execution model is mostly in terms of how it differs from C and every other language. But I still remember learning things about C’s execution model that were shocking and amazing, and I still managed to learn and be productive in C before I understood those things, so why would it be different with Haskell?

I want to have a concise, cogent, interesting conclusion to insert here, but there really isn’t one, because programming is not a destination and all of my conclusions are provisional. The more I learn, the more discomfort there is. For one thing, if Haskell is so wonderful, why is there no good dependency management? Why do languages like Lisp and Smalltalk depend so much on manual intervention with mistakes, why can’t they have a strong, inferring type system like Haskell? More importantly to me, why am I never really able to commit to an environment like Smalltalk that embraces code evolution and failure, which seems like something I would like, and why do I like a system that tries so hard to achieve compile-time correctness when I depend on rapid iteration to write code?

One promising option I am looking forward to playing with more is Autotest, which detects which tests should be run after each change of the code and runs them, used in conjunction with TDD. This could potentially go much further than strong typing, but what if I’m testing something that has to do I/O? A question for another day.

October 30, 2011

I have been using Smalltalk, specifically Pharo Smalltalk, a derivative of Squeak, for some light experimentation with Seaside and proper object-oriented programming.

I really do spend too much time being a philosopher of programming and not enough time actually programming. On the plus side, I view this with some derision as a fault.

I realized last year or so that all the functional programming in the world wasn’t going to make me adequate at OO design, and I really haven’t advanced at that skill since the middle of college. I got really into Ruby in about 2003, but by 2005 I was more-or-less in the FP camp. Well, now I’m in the delightful position, thanks to doing Java professionally and Haskell amateurly, of having not advanced at OO design or FP design.

I maintain that Smalltalk is the only OO language that can really teach OO design. This is not a new opinion to the world, but fairly new to me. I see the breeze with which Smalltalk APIs seem to become clear and easy, and I have to say there’s something sublime going on that I can appreciate but not yet perform. I would like to be able to perform it, though, because my attempts at designing nice OO interfaces at work are very poor. My boss does much better, although he has the benefit of having been an early adopter of OO and of Java, and he never was distracted by FP or any other development in the annals of CS, whereas I’m usually distracted.

None of this is getting me any closer to a reasonable point.

Illustrations of the Power of Smalltalk

Procedural Code in Seaside

All of Seaside is fairly impressive, but one thing that caught my attention has been the ability to port scanf/printf-type C code over to Seaside without much change in flow control. My wife is taking CSE 113 at NMT, and for fun I have been doing my own versions of her lab assignments in my weird languages like Smalltalk and Prolog. I scratched my head for a good several minutes before realizing that it would likely be fruitless to try and implement console I/O from Pharo, it would be far simpler to implement a simple Seaside web-app instead. And indeed, this kind of code:

int guess;
printf("Enter a number between 1 and 100: ");
scanf("%d", &guess);
if (guess < generated)
  printf("Too low!\n");
else if (guess > generated)
  printf("Too high!\n");
else
  printf("You got it!\n");

ports quite easily to Seaside:

| guess |
guess := self request: 'Enter a number between 1 and 100: '.
(guess < generated)
  ifTrue: [ self inform: 'Too low!' ].
  ifFalse: [ 
    (guess > generated)
      ifTrue: [ self inform: 'Too high!' ]
      ifFalse: [ self inform: 'You got it!' ] ].

Apart from my inability to know the right way to do a case/switch type thing in Smalltalk, it’s a fairly straight-across port, except now it’s displaying form entries and data on a web page and reading from them.

Interactive Debugging Like No Other

Having a bug in your Seaside code is quite fun to debug, at least initially. If you see the traceback in the browser, you can hit “Debug” and find yourself in the Smalltalk debugger sitting on the line of code that blew up. This is like most any other debugger in other languages; you can move around in the stack, inspecting objects. Unlike other languages, you can also change the code in the debugger, hit save and hit proceed or restart, and possibly get further in running your code without having to restart the servlet or even begin a new request, necessarily.

Crazy as this sounds, it’s not the limit. I ran the Magma server from a workspace and accidentally closed the workspace, losing my variable with the reference to the server controller. I ran the Process Inspector to see what threads were running. Found a bunch of Magma processes but not a reference to the server controller. I learned at that point, though, that you can actually inspect the whole stack of any running thread, anytime you want.

Next I discovered you can send allInstances to any Smalltalk class, and it gives you an OrderedCollection of instances of that class. I inspected that and was able to inspect my way to the server controller I had instantiated, and I could click in the little message window and send it messages. This it crazy and hot, and I don’t know of any other system with this kind of power, apart from Ruby with ObjectSpace, but it’s not even this cool.

March 17, 2011

← Newer Page 12 of 13 Older →