The cat’s out of the bag; the reason why there are fewer women in computer science has finally gotten out. And what a predictable reason: rampant, uncontrolled, pervasive sexism. The only thing surprising to me about this is the sheer number of episodes. Something like four incidents leading to news stories on high-profile sites like Reddit and Hacker News.

Pleasantly, there have been a lot of people saying the right things. Predictably, there have been a few trolls. But one nagging element that seems to keep coming up is: but what if this is just part of our corporate culture? What if you’ve got a really free-speech, open-air environment in which dynamic people get up and speak their minds, aren’t you going to somehow destroy our beautiful and unique culture by constraining us?

No.

And I’ll give you three reasons, because it takes me three pages to say what James Hague says in three sentences.

1. You are programmers

What makes you programmers? That you program. That’s it. Your biases have nothing to do with it. Since you’re a programmer, you probably feel like everything about you is rational and can be argued and defended. Not so. And because you’re a programmer, the reason you come to work is—to program, not to tell dick jokes, not to see which one of you can make the rest of the world the most uncomfortable. If you had to choose between actually programming and telling dick jokes, and you’d honestly choose telling dick jokes, or somehow argue that your ability to program would be compromised by not being able to tell dick jokes—newsflash, you’re not a programmer.

2. Your workplace isn’t a frat

Contrary to what you must have dreamt up with your buddies, a workplace is a workplace, startup or not. The appropriate time to discover if somebody fits with your corporate culture” is the interview. Not six months later, in the middle of a big project, when you finally let your guard down and start flying your freak flag. Your workplace is governed by the same laws that govern every other workplace, be it an office, the DMV, the doctor’s office, whatever. There is no bu-bu-but we’re a magical startup!” clause. If you’re making life hard on an employee of yours with your environment, that’s a hostile environment, period. So when someone says to you or your supervisor, hey, you need to back off on the dick jokes, that means you have to back off the dick jokes, not whine about how you were misunderstood or that you mean it ironically or that someone should have brought it up with you directly.

3. Freedom of speech is not relevant to programming

I love America, and I love freedom of speech, and (obviously) I make wide use of it. You are absolutely permitted to make all the dick jokes you want. You just have to not make them in front of other employees. If you need to be able to do that to write your code, you are not a programmer. Opine and whine all you like—just not while your coworkers are trying to do their jobs.

An Upheaval of Sorts

I don’t know anyone outside a family-owned business or programming who actually gets away with the kinds of unprofessional bullshit we do. It’s completely absurd, and I’m happy to see it improve. Let’s stop being complacent. Let’s choose to be professionals in this field because we love what we do, rather than choosing to kill our own profession by keeping it uninhabitable.

March 29, 2012






James Hague wrote a wonderful, vaguely controversial essay Free your technical aesthetic from the 1970s.” I’m refusing to comply, for two reasons:

  1. Those who forget the past are doomed to repeat it
  2. I hate unnecessary optimism

I think James would agree with the notion that it is too easy to get complacent. Computing, after all, did not emerge from Babbage’s garage with an Aqua look n feel. The road leading to today had a lot of bends, twists, blind drops and dead alleys. James is more interested in where computing will be tomorrow. I respect that. For whatever reason, I am much more interested in where computing was yesterday.

Aside on textbook pipe usage

It’s easy enough to debunk his attitude toward Linux. There’s plenty of interesting stuff going on. Linux is hard to love as a Unix, which is one reason I promote BSD. If the goal is small commands that you glue together like plumbing, it helps if the commands are actually small. I find it tremendously useful to be able to pipe commands around, not a textbook exercise at all. For example, I’ve done all these in recent ~/.history:

ps auxw | grep firefox
awk -F, '{ print $2 }' /tmp/projects.txt  | sort -u > legacy-pcodes.txt
grep 'Hibernate: ' precaching.txt | sed 's/.*\(from [^ ]*\).*/\1/g' | sort | uniq -s
xzcat mcsql-widarmon.sql.xz  | pv > /dev/null

Some of these are mundane (finding Firefox’s pid so I can kill it); others are a bit more interesting. The second one there is a bit of very light CSV parsing, the third one is showing me what Hibernate is spending its time doing, the fourth one there is letting me see the uncompressed size of an xz file. Probably there are better ways to do some of these, but after so many years of Unix, it’s just faster and easier for me to do a one-off like this than spend 30 seconds consulting the documentation. Heck, even when I do consult the documentation, it sometimes looks like this:

curl --help | grep -i header

Yup, I didn’t want to have to read through all those options. Do these look like textbook usages of the pipe? I have no idea, because I’ve never seen a textbook of Unix pipe usage. All I know is what I’ve seen in various blogs and what I’ve seen my friends do. It came naturally, I guess.

Obscure alternatives

Yet, even BSD is not so small anymore. Using Perl to damn Linux is also a bit of preaching from the porn. Pike, after all, was one of the greats behind Plan 9, current record holder for world’s weirdest and most obscure operating system you can run on a modern computer. The dream of Unix is alive in Plan 9, so to speak.

Plan 9 is very much the answer to the question, what would Unix be like if it could be rewritten with no regard for backwards compatibility? Everything is a file, all the old stupid file formats are made a little more sane, and so forth. You can’t do this anywhere else:

  • cat /dev/keyboard: it’s what you’re typing right now
  • cat /dev/mouse: it’s screen coordinates for your mouse cursor
  • the display is a flat file, the window manager arranges for each window to think it owns it
  • cat wave audio > /dev/audio to play it

These may not seem like much, but in Plan 9:

  • any file can be transparently mounted across the network to any other location: if I want your mouse to control my screen, we can mount your mouse over mine in the filesystem; similarly for all my other hardware
  • to use the network gateway, mount its network card over yours
  • want to write a network server? use the 9p library and instantly get remote access to your apps internals

Obviously it isn’t all roses, and I think James and lots of other people would blame the aesthetics” of Plan 9 for a chunk of that, maybe even the main chunk of that, but we must not let advances be forgotten just because they don’t pan out.

Forgotten futures

Between 1968 and 1970, there was a fantastic AI demo at MIT, called SHRDLU. When this program ran it displayed some shapes in a small 3D world. It then allowed the user to move the blocks around and create shapes as well as make queries.

When I write, allowed the user to move the blocks around” it’s easy to imagine that I mean the user clicked on a block and moved it somewhere else. That’s not what happened though. Instead the user wrote complete English sentences which then resulted in SHRDLU creating a plan of action which it would then execute by moving things! On top of that, the queries were also complete English sentences! This was 45 years ago!!

Person: Is there anything which is bigger than every pyramid but is not as wide as the thing that supports it?
Computer: Yes, the blue block.
Person: Does a steeple
Computer interrupting: Sorry, I don't know the word "steeple".
Person: A "steeple" is a stack which contains two green cubes and a pyramid.
Computer: I understand.
Person: Are there any steeples now?
Computer: No.
Person: Build one.
Computer: OK. (does it)

This is not a fantasy, this is an actual program with actual source code you can download today. Look at everything that happened there:

  • Someone gave a very complex query and got an exact answer
  • They then defined a new term declaratively
  • They asked the computer to figure out how to do something based on that definition—which it then actually did.

I have a hard time understanding why people are not shocked and amazed constantly at stuff that we used to do. I have an app on my iPhone called Shazam, which is easily the most amazing app I have ever had on my phone and the handiest. What it does is, I click a button, it listens for 15 seconds and then tells me what song is playing in the background. This was easily the most convincing piece of artificial intelligence anybody had on their phone for about four years.

SHRDLU took a declarative definition from plain English and converted it into a plan of action, which it then executed. Hmm… sound familiar? Sound like, I dunno, Siri?

Amazingly, SHRDLU comes in at under 20K lines of code. No corpus, no statistics.

Prolog versus the world

I imagine that by virtue of having taken a compilers class I’m substantially ahead of the curve. In that class, we started out talking about regular languages, then context free languages, then briefly we talked about context sensitive languages. The general attitude was, these things are hard to do, so we’re not going to do them, and we retreated back to context free languages, one in particular (Tiger) which we implemented for the rest of the semester, and that was that.

The general attitude this instilled is, well, you can do amazing things with context free languages, and that’s all the power you’ll ever really want, so stick with that and go about your life.

A classic example of something which is impossible to do with a context free language is something like agreement in natural language. If you want to be cheesy, it’s not too hard to write a piece of Prolog that will parse a sentence:

sentence --> noun_phrase, verb_phrase.

noun_phrase --> article, noun, !.

verb_phrase --> verb, noun_phrase, !.
verb_phrase --> verb.

article --> [the].      article --> [a].

noun --> [cat].         noun --> [cats].
noun --> [mouse].       noun --> [mice].

verb --> [eats].        verb --> [sleeps].

?- phrase(sentence, [the,cat,sleeps]).
true.

?- phrase(sentence, [the,cat,eats,the,mouse]).
true.

Of course, this permits insanity as well:

?- phrase(sentence, [the,cats,eats,the,mouse]).
true.

The cats eats the mouse? Bullshit! But it turns out, this can easily be rectified by passing along the number:

sentence --> noun_phrase(Num), verb_phrase(Num).

noun_phrase(Num) --> article, noun(Num), !.

verb_phrase(Num) --> verb(Num), noun_phrase(_), !.
verb_phrase(Num) --> verb(Num).

noun(s) --> [cat].         noun(pl) --> [cats].
noun(s) --> [mouse].       noun(pl) --> [mice].

verb(s) --> [eats].        verb(pl) --> [eat].
verb(s) --> [sleeps].      verb(pl) --> [sleep].

?- phrase(sentence, [the,cat,eats,the,mouse]).
true.

?- phrase(sentence, [the,cat,eat,the,mouse]).
false.

?- phrase(sentence, [the,cats,eat,the,mouse]).
true.

Now we can keep on taking this further. What about this nonsense?

?- phrase(sentence, [the,cats,sleep,the,mouse]).
true.

What? Sleeps doesn’t take a direct object!

verb_phrase(Num) --> verb(trans, Num), noun_phrase(_), !.
verb_phrase(Num) --> verb(intrans, Num).

verb(trans, s)   --> [eats].        verb(trans, pl)   --> [eat].
verb(intrans, s) --> [sleeps].      verb(intrans, pl) --> [sleep].

?- phrase(sentence, [the,cats,sleep,the,mouse]).
false.

?- phrase(sentence, [the,cats,sleep]).
true.

?- phrase(sentence, [the,cat,sleeps]).
true.

?- phrase(sentence, [the,cat,eats]).
false.

?- phrase(sentence, [the,cat,eats,the,mouse]).
true.

Look at this, we have a 4 line grammar that handles transitive and intransitive verbs separately and ensures number agreement. This is considered a very hard problem, and yet, here is this tidy little Prolog program that just does it. The dream of the 70s is a live in Prolog.

In the real world, this kind of program would never fly because we expect natural language processors to be fuzzy and handle the kinds of mistakes that humans make when talking and writing, and something like this is very rigid. As a result we’ve married ourselves to statistical NLP, which means huge corpuses, which means lots of memory, which means heavy programs with a lot of math behind them.

It’s strange that we live in such a bifurcated world, that on the one hand we have astronomical demands for a certain class of program, but for the kinds of programs we use day-to-day, they’re shockingly procedural. The user is expected to learn all these widgets and how to use them and what they mean (intuitive my ass) and then whenever they need to do something they just do it repeatedly. At my work one of the last things to be produced before an observation is a script that runs the telescope. The script is written in Python, but it’s completely generated by a set of objects that live in a database, which were composed, painstakingly, by a user using a point-and-click web GUI. Wouldn’t it have been more efficient to just learn to program? Or, perhaps, would it have been more efficient to build a large, complex manual NLP system like this that could parse a plain-text English specification and produce the script? We’ll never know.

Melt-your-brain debugging

The standard debugging procedure for a Smalltalk app is:

  1. Something goes awry: message not understood or something happens
  2. The app pops up a dialog with the debugger window
  3. You enter some code and hit proceed
  4. The app continues like nothing happened

How does this pan out in a web app? Similarly, only this time between 2 and 3 there is a 2.5: you VNC to the Smalltalk image running your web app on the remote server.

This sounds awesome enough, right? But apparently it gets better. You can replace that pop up a dialog” machinery with a serialize the entire current context to a log.” If you’ve ever debugged a Java web app, you are used to rooting around in the logs reading exception traces. This is like that—except, instead of reading an exception trace, you get to load the actual moment the exception was triggered into your local debugger and pick up from there as if it just happened locally! Have you ever had a problem you couldn’t debug because you just didn’t know how to reproduce the bug locally? That problem isn’t even meaningful in Smalltalk!

Cocoa doesn’t have this. C# doesn’t have this. Nothing else has this.

Unnecessary Optimism

If you’re like me, you read the foregoing commentary and thought, wow, how awesome are Plan 9/Prolog/Smalltalk! Ninety five percent of what I see is the same old fucking around with different clothes. Now you get whooshy effects when you drag n drop, but it’s still drag n drop. Now you can put multiple fingers on your browser window, but you’re still clicking on drop boxes to put in your birthday.

An even bigger letdown for me is the whole NoSQL situation. The proponents of NoSQL aren’t even aware of how they are repeating history by recreating the hierarchical databases of yore. E.F. Codd was explicitly trying to rectify these mistakes when he formulated relational database theory 40 years ago, and now we’ve come so far along that we’ve wrapped around and apparently must re-learn these mistakes a second time. The morons promoting this technology have no clue, they see themselves as the new sheriffs in town, putting their scalability law above all else.

It’s easy to forget that your forebears were once wild young bucks as well. The difference, of course, is that Codd understood math and was not blown around by the winds of popular opinion, and much time passed between the refinement of his idea and the first relational database that could perform. Since then everyone seems to have forgotten why they were a good idea in the first place: no duplication, true data integrity, and proper handling of concurrency. Buckle up, it’s going to be a rough ride.

Conclusion

Whether or not we like it, we are products of what came before. I would like to look at the recent stuff coming out of Apple with the same optimism as James Hague. There is some interesting stuff, but how hard it is for me to forget all the things I’ve seen. And I may have rose-tinted glasses; after all, I wasn’t there when it happened. Let me say this, though. If you want to blow your mind, go to your CS department’s library and look through the books printed before 1980. You’ll be surprised how much diversity, hope and optimism there was, back when our technical aesthetics were completely enslaved to the 1970s.

February 22, 2012






If you took calculus in school, you probably remember it as a blurry mismash of baffling equations, difficult algebra, many seemingly irrelevant real-world” examples and overall confusion. The blame for this is usually left squarely on the academic establishment, who are certainly at fault, but there is more to it than that.

What is calculus?

The word calculus” is actually a generic word for a mathematical system. There are lots of other calculii; computer science has lambda, pi and join calculii. So why do we call Newton’s calculus calculus” and not something more specific? Looking at history, you find that calculus” used to be referred to as infinitesimal calculus.” This is an accurate and appropriate name, because calculus is the foundation of studying rates of change and summations over time. These are things you need in Newtonian physics, because you’re interested in saying, if I have this function of distance, what is the rate of change, in other words, the velocity. Alternately, if I have a function of distance, what is the total distance?

Newton conceived this infinitesimal calculus as merely this tool he needed to solve certain general physics problems like the above. Velocity is defined by direction plus change in position over the length of time, but there could be such a thing as an instantaneous velocity; everybody with a car is familiar with this notion, it’s your speedometer. The total distance, likewise, is the sum of all of the individual distances moved, instant to instant, along this function.

The tools provided, the derivative and the integral, happen to be incredibly general tools. This leads to the first problem of calculus instruction: lack of clarity about what calculus is. Frankly, this is a lot like the confusion in Haskell with monads. They’re not a difficult concept, they’re actually quite simple. So simple, the applications are unlimited. So then you get unlimited examples, which wind up creating more confusion. Instead of teaching math, metaphor gets taught, and we pretend that distilling the commonality between six metaphors is easier than just understanding the math.

The second problem with calculus instruction is that we no longer use infinitesimals! No wonder the word has fallen out of use; how hard would it be to understand infinitesimal calculus without infinitesimals in it!

Now, Newton and Leibniz managed to make great use of calculus, but they never improved on their own informal definition of what an infinitesimal was. This kind of thing leads to great discomfort in mathematicians. Other mathematicians (Bolzano, Cauchy) found a way to skirt the issue by defining the limit formally without using infinitesimals. From there, all the rest of calculus can be rebuilt. Thanks to Karl Weierstrass, this is the formulation used by textbooks today: a formulation without actual infinitesimals, but retaining Leibniz’s notation that uses them.

It would probably be impossible to understand or use calculus without an intuitive grasp of infinitesimals, yet all the textbooks for calculus instead use the epsilon-sigma definition of the limit as their basis. This leads to absurdities like learning the formal definition of the limit after two weeks of computing them with various formulas and heuristics, which my wife is enduring now. Furthermore, the restatements of the derivative and the integral with limits are less intuitive. This should be obvious, since Newton and Leibniz started with derivatives and integrals respectively and not limits, which are completely tangental to all of the things they actually wanted to calculate. The end result is that students are expected to perform derivatives and integrals using a formalization that bears no resemblence to the intuition they’re being taught—which they need to understand what they’re doing. And the book must also include tons of real world examples” in order to get into print and into schools.

infinitesimal calculus with infinitesimals

The one fundamental problem was that Newton and Leibniz depended on an informal tool to build their formal system. The solution Bolzano found was to eliminate the use of that tool. There is another solution, found in the 20th century by Abraham Robinson, which is to formalize the concept of the infinitesimal.

Robinson defines a new set of numbers, the hyperreals. This is the real numbers plus the infinitesimal. The infinitesimal is a number greater than zero but less than any positive real. He then provides algebraic laws for manipulating the infinitesimal: it times itself is itself, it plus itself is itself, and so on. He then defines a relationship, infinite closeness” which can be essentially used to discard infinitesimals.

After learning this fairly simple and somewhat intuitive notion, you can proceed directly to taking derivatives, just like Newton did. You can then proceed to taking integrals, the inverse operation, by summing an infinite number of infinitesimally-thin strips to calculate area under a curve. In both cases, you’ve really just learned a small extension to algebra and are doing algebraic expansions to do calculus. You’ve built up the two fundamental operations of calculus without introducing the limit or the clumsy reformulation of those operations in terms of it.

How does this help me?

If you’re interested in knowing calculus, rather than just passing the class, I suggest you read the book Elementary Calculus: An infinitesimal Approach (or buy the new edition). This book follows exactly the approach. I’m going through it now myself, and I’m finding the process really enlightening. There is also research that shows that using the infinitesimal approach leads to greater comprehension of the material.

This method is called non-standard calculus” and is considered heretical by some mathematicians, notably constructivists and intuitionists. The details of their interpretation of mathematics are interesting, but not particularly relevant apart from the one detail that constructivism is related to computability, but I see Juha Ruokolainen has developed a constructivist version of nonstandard analysis (even without infinity) that addresses the complaints, and besides, they themselves are considered heretical by other mathematicians.

Moreover, learning the infinitesimal approach gives you a general problem solving tool that can extend beyond calculus. I have a hard time imagining what such a thing might look like, but I find the idea very appealing, in part because nobody has given me seventy five real world” examples, but you get the gist.

February 1, 2012 math






The very first item in Dan Ingall’s esteemed Smalltalk introduction Design Principles Behind Smalltalk” is this:

Personal Mastery: If a system is to serve the creative spirit, it must be entirely comprehensible to a single individual.

Most of the other points Dan makes in this essay are well-served by modern Smalltalks, and apart from a few glaring outliers, even by other modern systems that are not remotely connected to Smalltalk, but this one item in particular stands out to me as the most criminally absent.

In order to place it first, I would imagine Dan felt that this item is crucial. He didn’t want it lost in the shuffle in the middle of the essay. And yet, even small, modern Smalltalk implementations contain thousands of classes. That pasty abomination Morphic is by itself a great demonstration of how not to build software: one class with several hundred methods. (I think Smalltalk suffers from not showing all the methods in one file; at least in Java when a class exceeds a couple thousand lines you know there is a problem, but there’s no way to know that about a class in Smalltalk).

Earlier today I was looking at the source code for J and generally thinking about J and how odd it is that I like two languages so radically different that they almost have no similarities at all. In Smalltalk, readability is paramount; in J, readability is something that will come in time.” In Smalltalk, the goal of the code is to communicate intent; in J, it is simply to process the data. In Smalltalk, one talks about objects and their responsibilities. In J, one talks about arrays and the functions that manipulate them.

As an aside, J is object-oriented, but I have trouble imagining this faculty is heavily used, if for no other reason than to use advanced features of J one must first use J, and there isn’t a lot of that going on either.

In any case, the source code for J is famous for being written in a C that has been heavily contorted to look and act like J itself. For example, drawing a paragraph of code from a random file in the distribution, you see this kind of thing:

static A jtmemoput(J jt,I x,I y,A self,A z){A*cv,h,*hv,q;I
c,*jv,k,m,*mv,*v;
RZ(z);
c=AC(self); h=VAV(self)->h; hv=AAV(h);
q=hv[0]; mv= AV(q);
q=hv[1]; jv= AV(q);
q=hv[2]; cv=AAV(q); m=AN(q);
if(m<=2**mv){A cc,*cu=cv,jj;I i,*ju=jv,n=m,*u;
v=ptab; while(m>=*v)++v; m=*v;
RZ(jj=reshape(v2(m,2L),sc(IMIN))); jv= AV(jj);
GA(cc,BOX,m,1,0);                  cv=AAV(cc);
for(i=0,u=ju;i<n;++i,u+=2)if(IMIN!=*u){
k=HIC(x,y)%m; v=jv+2*k; while(IMIN!=*v){v+=2; if(v==jv+2*m)v=jv;}
cv[(v-jv)/2]=cu[i]; cu[i]=0; v[0]=u[0]; v[1]=u[1];
}
q=hv[1]; AC(q)=1; fa(q); AC(jj)+=c; hv[1]=jj;
q=hv[2]; AC(q)=1; fa(q); AC(cc)+=c; hv[2]=cc;
}
++*mv;
k=HIC(x,y)%m; v=jv+2*k; while(IMIN!=*v){v+=2; if(v==jv+2*m)v=jv;}
cv[(v-jv)/2]=raa(c,z); v[0]=y; v[1]=x; 
R z;
}

I saw this and I thought to myself, damn, that’s totally unreadable, but then upon reflection a couple other things occurred to me.

Users who Program, and Other Yetis

Proposition 1 of Design Principles Behind Smalltalk” is predicated on the notion that if we make programming easy enough, users will take it up. And yet, as we have seen, users do not and Smalltalk remains obscure, heavily used in a handful of industries but mostly kept alive by a medium-sized group of passionate, die-hard fans. Interestingly, APL and J are known for having a clutch of equally die-hard fans, many of whom know no other languages.:

If APL were a specialist, complex language, it would only attract the Boy Wonders” of IT, those with A-Grades in everything, whose horizons are limited by bits and bytes.

So it is paradoxical that the great majority of IT people have never really understood APL. Those who have used it successfully have very often not been computer-literate, or have only a slight knowledge … and they have frequently learned APL in isolation. That is to say, the language serves anyone prepared to explore it without prejudice.

This got me thinking. In addition to not knowing non-programmers who are interested in programming in any capacity, I also know of no users who use Unix with proficiency and have no interest in programming. And the crux of Plan 9 was to return Unix to its pure roots, where everything is a file and everything is done in the classic Unix way by stitching together small programs and leaving the heavy lifting to C programs. So it seems interesting that these systems with an emphasis on simplicity—in the name of hypothetical casual users—seem to enter obscurity without having been wildly adopted by any of these casual users.

Programmers Like To Program

It’s widely bemoaned that programmers like writing programs, even when doing so is unnecessary busy work. When Tom Lord discovered distributed version control, we didn’t just use arch, we all had to write our own DVCS, and today we live in a world with six or seven that have essentially the same featuresets. Back in the late 1990s, everybody had to write their own window manager. Now, in the early 2010s it seems like everyone is writing their own NoSQL” database.

Since programmers like to program and there weren’t really any non-programmer users of Smalltalk, it makes sense that Smalltalk got complex and forgot its roots. This is because a programmer is not all that intimidated by being presented with a huge system of astonishing complexity if we can just dive in and use it. If you have to understand everything, you’ll never write any code. So keeping things comprehensible never occurred to anyone; we were too busy writing abstractions to keep things simple for us!

The Questionable Importance of Readability

Let’s stick with J and Smalltalk even though Perl and Python would probably be better examples. Ignoring the rounding errors, say J and Smalltalk have similarly-sized user communities. Obviously there must exist a category of people for whom the initial readability is not a key concern. Indeed, looking at Haskell and Ruby web frameworks, I find that even if you have a system of aesthetics based on simplicity, your advanced users will bend them to increase power. This is why non-Yesod users balk at Yesod’s use of Template Haskell. The fact that Template Haskell is well-understood and well-supported doesn’t make it aesthetically pleasing. Snoyman can put all the work he wants into making it simple and easy and pleasant, but until Template Haskell is core-curriculum stuff it’s going to be off-putting to new users.

So looking at the J source code, it’s easy for me to hold my nose and say, that’s totally unreadable garbage; how can that be maintained? But at the same time, it’s not my place to maintain it. Imagine if it were written in the most clean, beautiful C code possible. I might be able to dupe myself into thinking I could maintain it, but it would be a lie! Is it so bad that complex projects like J have complex code? If it were a complex Java program instead, I’d still need substantial time to learn it before I would stand a chance at modifying it. Making it J-like means I am required to understand J to change the source code. Wouldn’t I have to understand J to change it anyway?

Does readability give us a false sense of maintainability?

A Thought about Power

If I were to tell you you should learn J because it is powerful, mind-bending, and high-performance, most programmers would say, yeah, maybe. But what if instead I told you, I have this language which is extremely terse, in which the output in some ways resembles and in some ways completely contradicts the input, but which is of high utility in a particular class of problems. At first it sounds like I’m trying to trick you into liking J, but I’m not—I’m describing regular expressions.

Think about it. Without batting an eye, I can write a regular expression to (say) parse an auth log line from this computer:

Nov 16 12:46:07 7gf sshd[20009]: Invalid user oracle from 200.216.162.113

A regex might look something like this:

^... \d\d \d\d:\d\d:\d\d 7gf sshd\[\d+\]: Invalid user ([^ ]+) from (.*)$

If you use regular expressions, it’s probably quite clear to you what that does, despite:

  • control sequences mixed in with literal text
  • the same character doing radically different things depending on escaping and context

What’s the difference between ^ at the beginning and ^ in brackets? What’s the difference between bracket and backslash bracket? Why do some things have to be escaped and not others? What’s the difference between + and *? These are all hurdles we have to get over to learn regular expressions, but once we know them, it’s not hard to remember how they work.

On top of all that, regular expressions do nothing to aid readability: they look nothing like English, nor do they look like instructions for how to match text. Despite this fact, nearly all of we programmers learned them, either on the job, in school or on the side, and we all use them with great frequency. We defend them, to the point that Larry Wall is possibly the first person in 30 or 40 years to suggest we could actually have a better syntax with which to do this. I know of just one program that matches text in a radically different fashion (I’m talking about txr) and even it lets you fall back on them.

Let’s consider a Smalltalk-like API for defining that same regular expression. Would it look like this?

| captures |
captures := TextPattern 
atStart;
skip: 3;
space;
digits: 2;
space;
digits: 2;
special: ':';
digits: 2;
special: ':';
digits: 2;
space;
literal: '7gf sshd';
oneOrMore: (TextPattern digits);
special: ':';
literal: 'Invalid user ';
capture: (TextPattern oneOrMore: (TextPattern space not));
literal: ' from ';
capture: (TextPattern zeroOrMore: (TextPattern any));
endOfLine.

Does that seem like an improvement? Maybe if you don’t know regular expressions. But you do. This is the crux of the APL/J argument. Where’s the flaw?

Afternote

I just ran into this document about the proceedings of the APL 2010 conference discussing OOP. In it you see some real gems, like:

I tried to learn SmallTalk once, but it was not a success.

and:

J does not have instances [in the OOP sense] properly speaking. They are locales and locales are not assignable (like in APL) so you cannot have arrays of instances or classes (which are also locales).

Interestingly, there seems to be some truck with Haskell users. I have seen mention of Haskell both on this page and on the J Programming Forum, but not really of other FP languages.

Tentative Conclusions

  1. It is more honest to see programming paradigms, development tools and language diversity as an effect caused by programmers need to program, rather than technical or market needs.
  2. Technical decisions promoted in the name of programming users” are lies.
  3. Readability as a concept may be a lie.

Addendum: Lack of Visi.ion

I just saw this Visi.io project linked from Reddit. I want to say it’s admirable what this guy is doing, and he has quite a pile of code already written. But even his version of history is quite damning: HyperCard is dead and Excel is rarely understood. Taken with the preponderence of other evidence, this project will likely go the same way as all the other well-intentioned user-liberation schemes. We’ve been trying to make programming accessible to non-programmers for the last fifty years. Has it worked? Are the missing ingredients really type inference, JSON and monoids? It wasn’t that long ago that HTML was invented specifically to be readable to humans and machines, and here he is dismissing it because it sucks compared to native apps. Well, spreadsheets also suck compared to native apps, and so did Hypercard stacks. Is the problem really that those tools lacked a sufficiently expressive language, and not that users would rather have higher-quality tools that do not require programming?

November 17, 2011






Juxtaposition is an interesting operator. I’m aware of no language in which this operator can be redefined by the user. I’m referring to simple placement of two tokens next to each other, such as x y.”

In the vast majority of languages, this operation has no meaning at all. For example, in Python:

>>> 2 + 3
5
>>> 2 3
  File "<stdin>", line 1
    2 3
      ^
SyntaxError: invalid syntax

In a few languages, strings juxtaposed are conjoined. In C this is limited to string constants (a kind of constant folding optimization).

Juxtaposition has a great deal of importance in functional programming languages. One could even argue that the manner in which it is interpreted is the principal differentiator between the three primary families of FP language. Perhaps use of this operator adds significantly to the general sense of FP being difficult. If so, it would be an example of something completely invisible to the FP community blocking adoption.

For the sake of discussion, I’m going to define the three principal orders of functional programming as applicative, concatenative, and array.

Applicative FP

The applicative order is the best known of FP. It contains the Lisp and ML families. In these languages, juxtaposed values are applied to functions on their left. For example, in Scheme or Lisp:

(+ 1 2 3)
6

(* 2 2 (+ 0 3 4))
28

What’s going on here is that the leftmost juxtaposed value at the same precedence level (meaning inside parentheses in the Lisp family) must necessarily name a function, and the remaining values at that level are arguments to that function. So + and _*_ above name functions and the digits are arguments. There’s no restriction that subsequent arguments must not be functions or that the leftmost argument not be, for example, the result of an application. The same basic rules apply in the ML family (assuming the functions named below worked as above):

sum 1 2 3
6

prod 2 2 (sum 0 3 4)
28

In applicative FP, functions are taken to exist which take at least one argument, and may take an arbitrary number, and juxtaposition is applying arguments to functions.

Concatenative FP

In concatenative FP, every function has the same API: take a stack and return a stack. Juxtaposition explicitly is thought of as concatenating two complete programs. The main family under concatenative FP is the Forth-like languages. So, to compute the sum of squares, one could do something like this:

2 3 dup * swap dup * +

This would leave 13 on the stack. What happened here, in terms of stack effects is as follows:

2
2 3
2 3 3
2 9
9 2
9 2 2
9 4
13

An interesting thing about this is that literals are not merely literals, they also push themselves onto the stack.

The winning property of concatenative languages (and the source of the name Factor”) is that any sequence of juxtaposed words can be extracted (“factored”) into its own word. To wit:

: sum-of-squares dup * swap dup * + ;
4 5 sum-of-squares .
41  ok

: square dup * ;
: sum-of-squares square swap square + ;
3 5 sum-of-squares . 
34  ok

: times-swap * swap ;
: times-plus * + ;
: sum-of-squares dup times-swap dup times-plus ;
4 4 sum-of-squares .
32  ok

Any run of words can be extracted in this fashion, because any run of words truly refers to a complete program which is performed on an input stack and producing an output stack. So every Forth program can be thought of as a pipeline acting on the same root data structure.

Array FP

Array languages provide functions with exactly two arities: 1 or 2 arguments. In general, these are referred to as monads and dyads respectively, but I have been referring to them as unary and binary operators, because this is the conventional name for such a thing.

Interestingly, juxtaposition in APL is how one builds an array of values. Prior to J, juxtaposition of operators had no meaning. In J, value juxtaposition means the same thing as before but operator juxtaposition means something quite novel, which they call verb trains,” which are interpreted as either forks” or hooks” depending on whether or not the train” is even or odd in length. This is reminiscent of the phenomenon of verb serialization in natural language, in which speakers of certain languages can emit a list of verbs to imply that all took place one-by-one in some order.

A hook is an even number of juxtaposed functions, and produces the following structure:

(f g) x becomes x f (g x)

A fork is an odd number of juxtaposed functions and produces the following structure:

(f g h) x becomes (f x) g (h x)

It isn’t immediately apparent, but what’s happening here is juxtaposition is producing a binary tree of operators. I’d like to go into more detail and produce better examples, but this functionality is a bit over my head still. I’m sure with practice this becomes easy, but I’m quite new.

Juxtaposition and Arity

It’s interesting to me that these three families of FP languages have three different notions of function” and also three different interpretations of juxtaposition. For example, in the applicative order, functions are generally regarded as taking any number of formal parameters. In the ML family, functions have a minimum arity of 1 (less than that and you have a value) and a maximum arity of 1 as well, with additional parameters faked either by tuples (more common in Standard ML) or by returning functions of decreasing arity. For example, in Haskell a function with type Int -> Int -> Int can be treated either as a function taking two integers or a function taking one integer that returns another function that takes one integer and returns an integer. Here’s a brief summary:

Order Family Arity Juxtaposition
Min Max
Applicative Lisp 0 N Apply items 1–N to function at item 0
Applicative ML 1 N Apply items 1–N to function at item 0
Concatenative Forth 1 1 Evaluate items left to right on the same stack
Array J 1 2 Build a binary tree of operators and evaluate it with arguments on right or left and right

Reflections on OOP

In all conventional procedural and OO languages, functions are defined and called by passing parameters in a comma separated list, as in:

def foo(a, b, c):
  return a + b * c

There is an interesting analogous question in languages that admit side effects: what to do when a function of arity 0 is invoked? Consider a function to return a random number which requires no parameters, say it’s called random. Do you call random by saying random or random() or something else?

The decision comes down to a question of whether or not you consider a function to be an implementation detail or not. Bertrand Meyer formulated the universal access principle which states that the client of an interface should not be able to distinguish between a simple property access and a function call, because the implementation should be able to choose the strategy that works best for it without breaking compatibility. In other words, whether a property of some object is computed on-demand or stored in a variable is an implementation detail. As a result, getters” in Eiffel and Ruby do not require parentheses on their invocation.

Python and most C-derived languages have adopted the opposite strategy which lacks a sexy name. In Python, the difference between foo and foo() is that the former refers to the function foo as a value and foo() is really a request to evaluate the function named foo and produce the result. The distinction is not meaningless. The Python library makes much less use of anonymous functions than Ruby’s, for example, but coding a higher-order function involves fewer moving parts.

Smalltalk and Ruby still permit the passing of functions as first-class values through a sort of reified function enclosure called a block. Smalltalk and Ruby do not permit direct variable access to objects, so all getters are really function invocations, but object foo in Smalltalk or object.foo in Ruby cannot be used to refer to the methods named foo on object. Both get around it in part by having a method Object#send which takes a method name (“selector”) as an argument for the specific case of wanting to call a method on another object as well as the block functionality. A block in Smalltalk looks like [ :param1 :param2 | code goes here ] and furnishes an object which, when value is called on it, returns the result of evaluating the block. Additional methods on the block permit passing of formal parameters.

This looks like a very similar tradeoff, but it makes somewhat more sense in that it is a property of the language with two alternatives rather than three.

November 17, 2011






Inspired by James Hague and waiting for a new release of Cuis I thought I might sit down and noodle with J. See if I get any further than I did last time.

So, I recently wrote a trivial Markov chain algorithm for Smalltalk. The code is pretty short, so I’ve included it at the bottom of the post. Rub your eyes on it after we talk about the J code.

I want to implement this algorithm in J, but before we can do that let’s learn how to do something a bit simpler. Let’s just count up letter occurrences. I’m inclined to see that as being related, but let’s face it, even that intuition is probably wrong with J.

First, let me show you my solution.

letter_frequency =: =((~. @: ;/) ,. (;/ @ (+/ @ |: @ =))) &: sort

Yeah… alright…

NB.
   letter_frequency 'Mississippi'
┌─┬─┐
│M│1│
├─┼─┤
│i│4│
├─┼─┤
│p│2│
├─┼─┤
│s│4│
└─┴─┘

So it seems Perl has a competitor!

Verbs

I’ll try to spare you the J tutorial because there’s a few really truly weird things about this piece of code, but let me give you this brief interactive tutorial to show you how I built this up.

NB.
   w =: 'Mississippi'
   sort w
Miiiippssss
   ;/ sort w
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│M│i│i│i│i│p│p│s│s│s│s│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
   ~. ;/ sort w
┌─┬─┬─┬─┐
│M│i│p│s│
└─┴─┴─┴─┘

So, sort does what you’d expect. The syntax here is reminiscent of Polish notation: we’re adding additional stuff to do to the front of the list and they’re taking effect last. So on line one I sort the letters, on line two I box the letters, and on line three I am nub’ing (removing duplicates) from the boxed letters.

The functions are called verbs here, and they’re all fundamental operators or functions except for /, which is called an adverb” and takes a second function as an argument and performs what a Haskell user would call foldr1. So the ;/ is saying, apply ; between each element, and ; boxes its arguments.

NB.
   sort w
Miiiippssss
   = sort w
1 0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1
   |: = sort w
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
   +/ |: = sort w
1 4 2 4

It would be much easier to appeal to what is visually going on here than to explain what’s really happening, so I think I’ll rest my case on that. The part here I had the hardest time figuring out was the = verb. It is defined as self-classify” and a bunch of examples in the documentation were enough to convince me it was close to what I need. The |: verb pivots the table, and we again use the / adverb to produce a fold, in this case summation. It is visually apparent that what’s happening there is we are summing down the columns to count occurrences. I am quite sure there is a less verbose way to do this without the pivot, but I don’t know what it is yet.

In any case, it should now be clear that we have both of the constituents one would need to show letter frequencies: an array of letters and another array of their occurrences:

NB. 
┌─┬─┬─┬─┐
│M│i│p│s│
└─┴─┴─┴─┘
 1 4 2 4

One can combine these a variety of ways, but the colorfully-named stitch” is the easiest I’ve found so far.

NB.
   (;/ 2 3 4) ,. (;/ 8 9 0)
┌─┬─┐
│2│8│
├─┼─┤
│3│9│
├─┼─┤
│4│0│
└─┴─┘

At this point, things get a bit weird. Defining functions in J is done either implicitly or explicitly. Implicitly is basically the same as point-free form in Haskell, by building up and composing functions. Explicit function definition relies on a piece of rather grody syntax, and didn’t work for me for one reason or another:

NB.
   letter_frequencies =: 3 : 0
letters =:  ~. ;/ sort y
occurs =: ;/ +/ |: = sort y
letters ,. occurs
)

I’m sure a better J user than me (say, someone who has used it for more than 24 hours) would be able to point out what I did wrong, but I must admit a dislike of the magic numbers” that seem to appear in J. File input/output is similar; for example:

s =. 1!:1 <'system\packages\misc\jforc.ijs'

Yes, 1!:1 is a single verb” which means, give me the content of the file named on the right. Not quite as readable” as +/, eh? Anyway, we can instead define this function implicitly, as I did at the beginning of this article. There’s just one question: what’s this @: and why is it in all of my functions? For example, the following is fairly confusing:

NB.
  |: = sort w
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
   (|: = sort) w
1 1 0 0 1 0 0 0 0 0 0
   |: (= sort) w
1 1 0 0 1 0 0 0 0 0 0
   (|: =) sort w
|domain error
|       (|:=)sort w

Odd. What about @?

NB. Correct way
   (|: @ = @ sort) w
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1

@ is basically unary function composition. So J isn’t like Forth, no problem with that. The really surprising thing isn’t that omitting the @ didn’t accomplish something, it’s that it did accomplish something: something weird and unexpected!

Trains, Forks and Hooks

The surprise is that juxtaposition” of verbs is much more complex in J than in other languages. It leads to something called a verb train, but the Cliff’s Notes are that:

(f g) y
means y f (g y)
(f g h) y
means (f y) g (h y)

So above, when I wrote (|: = sort) w, J interpreted this as: (|: w) = (sort w), and when I had |: (= sort) w, J interpreted it as |: (w = (sort w)), neither of which is close to being what we want! In fact, what is being calculated is an item-by-item equality comparison between the string w and the sorted version of w, so you can see why we get the result we get:

NB. Ignore the code
   3 11 $ (;/ w) , (;/ sort w) , (;/ (= sort) w)
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│M│i│s│s│i│s│s│i│p│p│i│
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│M│i│i│i│i│p│p│s│s│s│s│
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│1│1│0│0│1│0│0│0│0│0│0│
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘

One of the tutorials has a good example of a place where this kind of thing is actually handy that I like:

NB. Sales tax in NM is 6.875%.
   sales_tax =: *&0.06875
   sales_tax 100
6.875
   (+ sales_tax) 100
106.875

The & adverb binds a parameter to a verb, making a binary operator (or a dyad, in J jargon) into a unary operator (or a monad in J jargon). So now it should be clear why the solution has all those @s in it.

Ultimately, this took me about an hour to write, with heavy documentation consultation. For comparison, the following Smalltalk version took me about two minutes:

letterOccurrences: aString
  | dict occurrences |
  dict ← Dictionary new.
  aString do: [ :char |
    occurrences ← dict at: char ifAbsentPut: [ 0 ].
    dict at: char put: occurrences + 1 ].
  ↑ dict

Unsurprising and unsexy, but workable.

Epilogue

In reading APL - A Glimpse of Heaven it’s not hard to see the affection the author is showing for APL, an ancestor of J. In fact I was able to translate most of his examples to J while reading it.

It would be hard to find two languages that are more different than APL/J and Smalltalk. Surprisingly, similarities are not hard to find. Both languages are highly interactive and a high premium is put on playing with the language on your own with private experiments. On the other hand, there seems to be a lot more in-depth documentation on the APL/J side, in part (I’m guessing) because it is so much weirder and in part because once one reaches a certain level of proficiency and comfort with Smalltalk, one learns from the image rather than a book.

I do have to wonder though. All of the APL and J fanatics talk about the language as a notation of thought.” It strikes me as a terrible idea to write software in a language so intrinsically unreadable, but maybe it’s true that after time it becomes quite clear. It looks like it has that odd benefit that SQL has, in that code once written probably lives a long life; I didn’t touch on this much, but all of the built-in verbs in J go to great lengths to be meaningful in both monadic and dyadic contexts and operating on different kinds of data, perhaps it works in practice like SQL in that, once the statement is written it rarely needs to be revisited. It certainly could have the benefit of inexpensive functional purity.

My friend Allan Poindexter has tried to explain to me on several occasions how Common Lisp through its macros is more powerful than Smalltalk. There’s a smidgeon of truth to that, though I believe modern Smalltalks have sufficiently expressive metaclass systems that you could achieve anything except raw syntactic extension. But at the same time, my solution to the above problem does not require three pages of English prose to explain it, and it seems obvious that in J this drawback would only get worse.

So it flies in the face one one’s expectations to hear allegations like this one from the Heaven article:

If APL were a specialist, complex language, it would only attract the Boy Wonders” of IT, those with A-Grades in everything, whose horizons are limited by bits and bytes.

So it is paradoxical that the great majority of IT people have never really understood APL. Those who have used it successfully have very often not been computer-literate, or have only a slight knowledge … and they have frequently learned APL in isolation. That is to say, the language serves anyone prepared to explore it without prejudice.

It’s hard to evaluate this claim on the surface, but it is worth noting that industries in which APL seems to have penetrated are highly mathematical (actuarial work, finance) but not known for having strong, proud traditions of software engineering.

I said fairly recently that when I program Lisp I feel like I’m tricking the machine into doing what I want. J amplifies this feeling. Smalltalk does not. In fact, when I use Smalltalk I frequently feel like I’m stating the obvious. What I don’t like about Smalltalk, really, is the lack of books. I feel frustrated at the size of the library, and unsurprised that I spend more time reading the library than writing code. What’s funny is, I used to make the same complaint about Haskell. J, as terse, ugly, and frustrating as it is, it is also very fun, and seeing a two page reference to the 70ish fundamental verbs of the language definitely gives one the feeling you are dealing with some kind of alchemy of programming rather than a standard library.

So I will echo James Hague’s endorsement. Try it. You may find you like it quite a bit.

Postscript: Smalltalk Markov Chain Code

Note: I suck at Smalltalk. If you have suggestions, please email me.

Object subclass: #MarkovData
  instanceVariableNames: 'chain'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'Markov'.

MarkovData>>addFrom: aStream
  "Add all the words in some data source to this Markov chain"
  | current previous |
  previous := #start.
  [ aStream atEnd ]
    whileFalse: [ 
      current := aStream next.
      self addWord: previous precedes: current.
      previous := current ].
  self addWord: previous precedes: #end

MarkovData>>addWord: previousWord precedes: anotherWord
  "Add all the words in this stream to this Markov chain."
  | words |
  words := chain at: previousWord ifAbsentPut: [ Bag new ].
  words add: anotherWord

MarkovData>>addWordsIn: aString
  "Add all the words in the passed string."
  self addFrom: (WordStream on: aString)!

MarkovData>>generateSentence
  "Generates a String sentence from the data we’ve collected so far."
  | wordStream |
  wordStream := String new writeStream.
  self generateSentenceOn: wordStream.
  ^ wordStream contents

MarkovData>>generateSentenceOn: aStream
  "Generates a random utterance on aStream."
  | word |
  word := self generateNextWordFor: #start.
  [ word = #end ]
    whileFalse: [ 
      aStream nextPutAll: word; space.
      word := self generateNextWordFor: word ].
  aStream skip: -1.  "back up a character, to not end on a space"

MarkovData>>generateNextWordFor: aWord
  "Generate the next word based on the given word."
  ^ (chain at: aWord) atRandom.

MarkovData>>at: aWord
  "Return the set of words that follow from this word."
  ^ chain at: aWord ifAbsent: [ Bag new ]

MarkovData>>initialize
  super initialize.
  chain := Dictionary new! !

MarkovData class>>buildFrom: aStringOrStream
  ^ self new addFrom: (WordStream on: aStringOrStream)

November 9, 2011 j