In the previous article, the discussion of lambda calculus is a bit misleading. We’re not using lambda calculus terms as the intermediate representation. We’re being inspired by the lambda calculus and aping it as a mechanism for propagating information around in the intermediate representation.

In Python, using the SQLAlchemy, you can construct conditions by using expressions like Users.t.name == 'bob'. The object in Users.t.name is some kind of column type, and it overloads __eq__ in such a way that when you evaluate it against bob’, rather than returning a boolean truth value, it returns some kind of filter expression object. This is a way of delaying binding: nothing is really computed at this moment, it’s just all bound together in preparation for a later evaluation step. In the case of SQLAlchemy, that evaluation is actually going to generate part of an SQL query. This gives the appearance of a bridge between Python and SQL.

What’s really happening in a DCG statement like sentence(VP) --> np(NP), vp(NP^VP) is a beautiful confluence of Prolog ideas. Prolog’s unification is used not just as pattern matching against the structure of the subterm result from the verb phrase, but also to propagate the value from the noun phrase into the verb phrase. Because unification is more like forming a link than procedurally assigning variables, we can unify things in any order. This is how we get a noun from before the verb and a noun from after the verb to live nicely side-by-side inside a term build by the verb.

We didn’t really wind up having to build anything like the lambda calculus, because binding in LC is just like binding in Prolog. In fact, we got a rename” step for free” because Prolog variable names are not global.

So where was the lambda calculus? It wasn’t in the AST we built, and it wasn’t really there in the building of it. The idea of variable abstraction from terms is all we used, and we only used it during the formation of the AST.

December 15, 2017






Parsing simple sentences in Prolog, you get grammars that look kind of like this:

sentence --> np, vp.
np --> pronoun.
np --> noun.
vp --> iv.
vp --> tv.

pronoun --> [i].
noun --> [cheese].
iv --> [sit].
tv --> [like], np.

Now with this, it’s possible to parse sentences as complex and interesting as i like cheese” or i sit”. Wow! Except, just accepting them is not very interesting:

| ?- phrase(sentence, [i, like, cheese]).

true 

The computer scientist will want to actually obtain the parse tree, which is going to have the same structure as the grammar:

sentence(s(NP,VP)) --> np(NP), vp(VP).
np(pronoun(P)) --> pronoun(P).
np(noun(N)) --> noun(N).
vp(iv(IV)) --> iv(iv(IV)).
vp(tv(TV, NP)) --> tv(TV), np(NP).

pronoun(i) --> [i].
noun(cheese) --> [cheese].
iv(sit) --> [sit].
tv(like) --> [like].

This gives us a parse tree:

| ?- phrase(sentence(S), [i, like, cheese]).

S = s(pronoun(i),tv(like,noun(cheese))) 

However, the result is hostile to evaluation. Why should the rest of my program have to know everything about my grammar? We should insulate it with a nice intermediate representation—and we’re using Prolog, so let’s make that intermediate representation look more like a Prolog query! likes(i, cheese)!

Wait a second. For this to work, I need to return a functor likes/2, but I don’t have likes/2 until after I parse i. And, I have to put cheese in there after parsing likes! This tree has a totally different shape than the input!

The solution turns out to be:

  • Use a lambda calculus-styled function as the verb phrase
  • Apply the noun phrases to that verb phrase to produce the result

In other words, our verb produced by our vp//1 isn’t going to be likes/2, it’s going to be a lambda expression which, when applied to the subject, produces that likes/2 functor. It’s going to be λx.likes(x,cheese). In Prolog, we’re going to encode this as X^likes(X, cheese), and then we can have a reduction step that looks like reduce(X^Term, X, Term):

| ?- reduce(X^likes(X,cheese),i,T).

T = likes(i,cheese)
X = i

So round three:

reduce(X^Term, X, Term).

sentence(S) --> np(NP), vp(VP), { reduce(VP, NP, S) }.
np(P) --> pronoun(P).
np(N) --> noun(N).
vp(IV) --> iv(IV).
vp(VP) --> tv(TV), np(NP), { reduce(TV, NP, VP) }.

pronoun(i) --> [i].
noun(cheese) --> [cheese].
iv(X^sits(X)) --> [sit].
tv(X^Y^likes(Y,X)) --> [like].

Note that we flipped the variables around in likes/2 because the next thing in the parse tree is the direct object, so that will be the first reduction, and the subject (which we obtained before) is going to get applied last, when the whole parse is complete. And the result?

| ?- phrase(sentence(S), [i,like,cheese]).

S = likes(i,cheese) 

Groovy. But… it turns out that reduce operator is so simple, we can do what it does without calling it explicitly. Ready?

sentence(S) --> np(NP), vp(NP^S).
np(P) --> pronoun(P).
np(N) --> noun(N).
vp(IV) --> iv(IV).
vp(TV) --> tv(NP^TV), np(NP).

pronoun(i) --> [i].
noun(cheese) --> [cheese].
iv(X^sits(X)) --> [sit].
tv(X^Y^likes(Y,X)) --> [like].

Kind of spooky looking, seeing np(NP), vp(NP^S), but it still works, because the structure that vp//1 produces is going to look like X^foo(...,X,...) for some term and saying NP^S is going to force NP and X to be equal, which is evaluation the way lambda calculus does it. Proof it works:

| ?- phrase(sentence(S), [i,like,cheese]).

S = likes(i,cheese) 

This is cool. But it turns out most noun phrases are not atoms like i, but actually they work out to be variables with quantifiers. The most obvious is a sentence like a student wrote a program”, which should not parse into wrote(student, program) but instead something more complex…

December 14, 2017 prolog






Earlier tonight I was accosted by a stranger for failing to finish the excellent book Software Foundations:

I think whoever recommended it to you had a common, bad idea… burnout with nothing to show

Hear that Tyler? That time you alighted on a branch with cloudstuff in your beak and illuminated us, tugging on the golden thread separating meaning and method, brought down the lambda calculus, gave us all the Formal Languages curriculum our professor wouldn’t dare and couldn’t dream—apparently you had a bad idea and that’s why all your students burned out with nothing to show for it.”

What a lack of imagination—yet, there I was on HN and [I got to see it again]((https://news.ycombinator.com/item?id=15784012):

I’m not sure I agree with the rationale for the dynamic typing. The designer says it means the language supports an arbitrary degree of polymorphism, but obviously it means it accepts invalid programs with undefined behavior

Fun fact you may not know: every character I am typing right now, flowing through my computer to my blog, over HTTP to be rendered on your screen, is being handled at every step by an unverified algorithm in a language that accepts invalid programs with undefined behavior. EVERY FUCKING CHARACTER.

Do not be so preoccupied with your interests that you cannot tell when you’re following a fad. Is Haskell really awesome? Yes, definitely. But there are a ton of really cool things out there that do not have an HM type system, and yet, we remain alive. Safety is important, but it’s not so important that we should rush to burn down every other language that doesn’t prioritize it. Prolog is also weakly typed, and if you run a nuclear reactor on it, Prolog’s weak typing is not your first problem. A lot of software just isn’t that important—yours, for instance.

I could be learning J right now, but it’s hard, so instead, I’m checking my mail and reading Hacker News. Programming is hard. I think if you look at Control.Unification you’ll see a totally amazing thing, something that can do things Prolog will never dream of. But… try to write a 10 line Prolog program with it, and you will suffer. It’s just not the same thing. Programming will always be hard. I’m absolutely not convinced that strong typing makes it easier. It’s just another thing in the eternal technology treadmill, something for you to worry about other than am I writing a useful program?” or do I have a problem worth solving with code?” which are impossible questions to answer by applying fads.

November 27, 2017






All other things being equal (they’re not, but let’s pretend), APL will always have a cult of appreciation because it represents really the only time programming has said we need our own alphabet.” But I don’t think APL symbols are really that evocative. Where does ⍳ come from? It looks kind of like a little i—it’s the index function, after all. i. seems just as legitimate and I can type that.

Some of them have some nice caché. ⌈ and ⌊ are min and max from mathematics, and J doesn’t have those. On the other hand, every other programming language on earth uses something besides × and ÷ for multiplication and division, and while learning APL there were plenty of times I used * when I meant ×, so I’m not sure this is especially significant. In actual math, I would probably just nestle two things next to each other for multiplication or use a dot, and division would be a whole huge thing. It doesn’t seem that hard to use <. and >. for min and max.

⋆ and ⍟ for exponent and logarithm come from where exactly? Nowhere. Now, ⊥ has an actual meaning in math that has nothing to do with decoding. ⊤ has a nice symmetry with ⊥ because it is visually inverted and is the inverse function. But J has many operators whose inverse is the same operator with a . or :; this set is #. and #:, but head/tail, behead/curtail, and exponent/log all follow a rule like this.

What does the symbol ⍕ have to do with formatting? There’s nothing it’s standing for here, it’s just meant to look a bit like encode and be typeable with an IBM Selectric using overstrike. Really. There were more APL symbols than spots on the typeball, so you’d back up and hit another key. That’s how they came up with ⌹ for matrix inverse. It’s a box around a division sign!

I’m not saying there are not nice symmetries with APLs symbolry. There definitely are. But J also has nice symmetries. APLs symbolry is closer to mathematics really only for multiplication and division, min and max, which you’re already accustomed to writing another way on a computer. The rest of APLs symbolry is a total fabrication—just like J!

Let’s also note that in A Programming Language, Iverson proposes lots of syntax that you can’t do in APL or any other language. The inner product syntax, which is amazing, isn’t available anywhere but that book!

November 22, 2017 apl






I realized recently that the way I have been handling answers to Prolog questions on Stack Overflow contravenes the rules. I don’t mind ignoring rules if it serves a higher purpose, but I have also come to realize that it actually is counterproductive.

There is a constant flow of pre-beginner questions about Prolog on Stack Overflow. The questions I’m referring to typically sound like this: my professor said a few incoherent things about Prolog and here’s my assignment and I don’t know where to start.” The pre-beginners are not at Stack Overflow to look at existing solutions (which is what Stack Overflow wants you to do). They just want someone to do their homework for them.. They don’t come back later or develop a deeper appreciation for Prolog. Some of this is unavoidable, but the flow is very high.

As much as I would like to (and have tried to) take the opportunity to foster an interest in Prolog, their questions are actually a symptom of a totally different problem that simply cannot be addressed within SO. I think the underlying problem is that Prolog is part of a mandatory curriculum, but the professor doesn’t understand it. How do you teach something you don’t understand?

I would like to live in a world where professors consider Prolog worth knowing and put in the effort. That probably isn’t realistic. If I can’t have that, I would at least like to live in a world where students’ first or only exposure to Prolog is something more constructive than Look how weird and confusing this thing is!”

I’m not in higher education, so I don’t know how to approach these problems. I would love to hear some ideas. But my first thought is a two-pronged approach:

  • A smallish (less than 10) solution database, where each solution is somehow annotated so that you can figure out both really low-level things (variables are uppercase) and really high-level things (why an accumulator is used). The idea is that it should be possible to answer any question, no matter how obvious, by mousing over things in the solution database.

  • A set of slides for use by professors to teach Prolog without really knowing it. There would be a few different versions (one class session, three class sessions) and a moratorium on providing solutions on Stack Overflow, instead pointing people who ask to the solution database.

Have other ideas? Please send me an email, let’s talk.

November 21, 2017 prolog






Every time I’ve spoken with someone about programming in the last month or so, machine learning or artificial intelligence has come up. Let me tell you the reasons why I don’t care a whole lot about it.

1. It’s not going to put me out of a job

At least 75% of the actual job of programming is figuring out what a person means when they ask for something. This is not a problem that a machine can solve for you, for the same reason that it’s harder to build a robot that builds other robots than to just build a robot. Have you had words with Siri lately? Do you think Siri is in a good position to understand your intent, or it’s maybe just searching for command words in a limited lexicon? Do you really think it could find Henry’s stool? Or does it seem more likely that it would get confused and require a human… trained to translate… high-level human-language… into language suitable for… the machine?

2. Machine learning is not creative

Machine learning is great at solving problems where you start with a thousand test cases and have no idea how to write the function to distinguish them. A great deal of the work of software 2.0” will be a sort of data docent work involving curating data sets of different character. This isn’t going to be an effective or efficient way of solving general-purpose problems any more than it was when Lisp and Prolog claimed they could do it in the 1980s.

By the way, Lisp and Prolog are still fun and cool, and playing with them are a great way to wash the bullshit out of your eyes about AI.

3. More random failure is not a feature

When I ask Siri to set a timer for my tea, it works about 90% of the time. For reference, this means that about once a week, it fails to understand me, and I always say exactly the same thing. This failure rate is completely acceptable for toys like Siri. It’s really not acceptable for most applications we like to use computers for. Is a transfer of money involved? It seems unlikely to me that anyone who works for a living will accept a 10% chance that the money either doesn’t get sent or doesn’t arrive. It seems more likely to me you want the most straightforward code path you can imagine. A big detour through an opaque neural network for grins feels like a bad idea.

The majority of software is actually in this situation. You don’t want a 10% chance your web service does not get called. You don’t want a 10% chance that your log message does not get written or reported. You don’t want a 10% chance that the email to the user does not get sent.

When are you content with a high probability of nothing happening? When you’re making something that is essentially a toy, or perhaps, doing a task where no one can really tell if you’re doing anything at all, which brings us to…

4. ML is painstakingly squeezing the last drop of profit out of a failing business model

The nice thing about advertising is that someone can pay you $500 to do it, and then if nothing happens, you can just shrug and say try again.” Google wants you to believe that you’re accomplishing something, but how many people out there actually know that the money they spend on Google is giving them a positive return?

Let’s play a game: if people who believe they need your product stop buying your product, and you’re left only with the customers who actually do need your product, how does your future look? I think if your primary product is advertising, you’re probably a lot more fucked than someone else who does something more tangible. To be blunt, if all fads were cancelled tomorrow, Facebook and Google would be done for, but Wolfram would get maybe a little blip. You want to work for the blip companies.

5. ML squeezes the last drop from the lemon

ML is a great tool for handling that last fuzzy piece of an interesting application. As an ingredient, it’s more like truffles than flour, potatoes or onions. It’s an expensive, complex tool that solves very specific problems in very specific circumstances.

Unfortunately, there are a few large companies that stand to make significantly more money if either A) machine learning makes significant advances in the failing advertising market, or (more likely) B) machine learning specialists glut the market and suddenly earn more like what normal programmers earn.

Does that change your understanding of why Google and Facebook might be trying to make sure this technology is as widely available and understood as possible? Bear in mind that unless you have massive data sets, machine learning is basically useless. A startup with a proprietary machine learning algorithm that beat the socks off anything Facebook or Google has would be greatly hampered by lack of data—and Facebook and Google are definitely able to compensate for any algorithmic shortcoming either by throwing a ton of compute at the problem or by throwing a ton of computational manual labor” hacks at the problem.

6. Something else is going to end the profession of programming first

Of the 25% of my job that actually requires programming, probably about 90% of it is in the service of things users simply can’t detect. Code quality. Testing. Following the latest UI fads. Worry about the maintenance programmer. Design.

Here’s a frightening question for any programmer. How much of your job disappears if users don’t mind if the app is ugly and are able to do basic programming on their own? I’m not being facetious. If we were really honest, I think we would admit that A) most of what we do is done for reasons that ultimately amount to fashion, and B) a more computationally-literate, less faddish user base requires dramatically less effort to support.

Let’s recall Fred Brook’s program-system-product grid:

Fred Brooks program-system-product gridFred Brooks program-system-product grid

What you’re seeing there is the admission of our industry that solving a problem costs 1/10th of what you’re paying, because you want a nice UI.

All it would take to decimate the population of programmers would be for people to get computer literate and not be so picky about the colors. This sounds to me like something that happens when crafts get industrialized: the faddish handcraftedness is replaced by easier-to-scale, lower quality work.

I think the emphasis on machine learning is looking at Fabergé eggs and saying all handcrafts are going to be like this” simply because it’s very lucrative for Fabergé.

Nope.

Your software is not going to be that cool. It’s going to come in beige and all the ports are going to be standardized. But, it will actually do real work that real people need done, and that is way better than making useless decorative eggs for the new Tsars.

Conclusion

Machine learning is boring.

November 14, 2017