Daniel's Blog

How would you chop a linked list in half? A trivial approach would be to just get the length of the list and then walk the list building a first-half copy by tracking the indexes until you get to half that length.

Thinking about it, I realized you could use the tortoise-and-hare approach from Floyd’s cycle detector to find the middle of the list: walk the list item-by-item and every-other-item at the same time; when the every-other-item list is exhausted, you’ve found the middle:

% bisect(+List, -First, -Last) is semidet.
% bisect(-List, -First, -Last) is multi.
%   First and Last are equal-length sublists of List such that
%   append(First, Last, List) is true and 
%   length(First, N), length(Last, N) is true.
bisect(L, First, Last) :- bisect(L, L, First, Last).

bisect(Xs, [], [], Xs).
bisect([X|Xs], [_,_|Ys], First, Last) :-
    First = [X|NextFirst],
    bisect(Xs, Ys, NextFirst, Last).

The other trick here is passing the tail of the list to the recursive call, sort of like an inexpensive difference list trick to give us O(N) appending to a linked list.

I’m pleased to note that this just fails for free on non-even length lists. No stupid solutions where one list is longer than the other.

?- bisect([1,2,3,4], X, Y).
X = [1, 2],
Y = [3, 4].

?- bisect([1,2,3,4,5], X, Y).
false.

?- bisect([1,2,3,4,5,6], X, Y).
X = [1, 2, 3],
Y = [4, 5, 6].

?- bisect(Z, X, Y).
Z = X, X = Y, Y = [] ;
Z = [_24123766, _24123772],
X = [_24123766],
Y = [_24123772] ;
Z = [_24123766, _24123772, _24123784, _24123790],
X = [_24123766, _24123772],
Y = [_24123784, _24123790] ;
...

June 14, 2018 prolog

Stack Overflow is going through some kind of asshole midlife crisis and this blog post is the corresponding spiritual Mustang GT.

Don’t be fatuous, Stack Overflow. Your culture is the way it is because of your rules. It’s not an accident that oh-my-stars we just peeked under the rock and discovered last week. Your rules created this monster.

What is the overriding principle of Stack Overflow? It’s that questions (and answers) have differing value. Some questions are just more useful than others. Same with answers. And the whole rest of the machine falls out of that concept. People who ask better questions are better. People who give better answers are better. The dialogue between a seeker and an answerer is garbage unless it leads to an answer. Commentary is almost worthless. Subjectivity is worthless! Only ask questions that can be answered! No, I will not recommend an off-site resource for you, noob!

Why are we assholes to people who ask questions that already have answers? Because your rules say to close questions that have already been asked.

I follow the Prolog tag. I can tell you about 1 out of every 20 or 50 questions is actually new. The rest are people wanting help with their homework. Most of their questions can be answered by reading the output from their terminal aloud to them. They need help. But the Stack Overflow goal, which is perfect answers to the best questions, is useless to them because they are essentially illiterate. If they were literate, they would read their console output, or their textbook, or their notes. Instead, they want you to do the work for them because Stack Overflow and the internet more generally has created a culture for them where there is simply no need for many programmers to read or think.

“All developers are free to participate”—but that participation is modulated by your rep. And with the vast majority of established engineers being white men, of course the bulk of women and minorities are not high-ranking Stack Overflow users. But the difference has nothing to do with institutional bias. It’s completely a function of the date you joined. Stack Overflow is an asshole to new users, period. Nobody knows or cares about your background, just that you’re a noob, what with your “hi” and your “thanks” and your not knowing Markdown and your asking a question that’s been asked a thousand times.

Stack Overflow seems determined to overthink their position and fuck it up somehow, first with Documentation and now this. Why did we all love Stack Overflow in the beginning? Because you got answers without all the commentary, the irrelevant massive signatures, the page 923 of this forum post, the “is this solution still good?” ten years after the fact. You beat the shit out of Experts Exchange, that’s what we loved about you. Who’s competing with you now? Nobody. So just shut up already. Making Stack Overflow a better partner in the evolution of my growth as a Jewish programmer is a meaningless and stupid goal for you. If I ask a question that already has a good answer, it’s a waste of everyone’s time to stop and answer it, and that’s really the only way to evolve from where you are now.

There’s a joke about contradictory requirements that goes something like, “I demand we build a new schoolhouse! I demand we use the bricks in the existing schoolhouse! I demand we keep the old school open while we build the new one!” Stack Overflow cannot be a welcoming place that caters to the wholeness of your being and still be the central resource for canonical answers to every question about programming. Those concepts cross each other exactly the same way “the bricks must be reused while the school stays open” crosses itself. You can’t be welcoming to new users and tell them to go fuck themselves for asking a duplicate question, and conversely you cannot be the home of the best comprehensive answers to the best questions while allowing your content to degrade into a billion answers of the same stupid questions because someone couldn’t be bothered to open their textbook before failing to implement their homework.

This is really disingenuous anyway, because you have, what, 50 developers against literally the rest of the world’s developers? Even if you had a great idea, you don’t have the manpower to police anything. Your police are, unsurprisingly, users with a lot of rep—in other words, the people most invested in the way the system currently works.

In short: you have no concrete plan, you have no resources to implement a concrete plan if you had one, and you have no actual incentive to do it. So this is nothing. Posturing-as-a-product.

May 1, 2018

The architect at my work recently handed a prototype he build to me and I was instructed to maintain it. In so doing, I have found a nice little parable here about how things can go wrong and how it can spiral out of control. Also, let’s spit on Hibernate.

At the outset, he chose to build a REST server with Hibernate. He went with RESTEasy, which antedates Jersey (JAX-RS/JSR-339) by a few years. The first problem in our story happens because he is using an old version of RESTEasy and an older version of Hibernate. Namely, the JSON writing portion (Jackson) gets confused by the Hibernate proxies to the model objects he’s retrieving and he proceeds to annotate most of the Hibernate mapping with lazy="false".

We have seen in the past that this is a common source of pain because Hibernate actually has two settings here where it should have one. The setting lazy="false" tells Hibernate, you must immediately retrieve these properties. But, another setting (such as fetch="subselect") actually tells Hibernate how to do this, and without it, you get a nice combinatorial explosion of queries as Hibernate must iterate row by row running queries that lead to it iterating again running queries… a single web request can easily balloon into several hundred thousand queries. Also, there were no indexes, so these queries were running full table scans for essentially every foreign-key based retrieval.

The immediate problem of JSON not being generated is solved, but the “root cause” problem of many other issues is born. Now many of the requests are slow. The next stop is apparently Angular. The best way to avoid expensive work is to not do it, followed by caching the results. Now he introduces a significant caching layer in Angular, in the browser. He does this before implementing pagination.

New problem. Rendering time appears bloated. (I think this is conflated with fetch time.) Solution: angular link-time string concatenation rather than using ng-repeat which generates watches recursively. Rendering time improves. Code is much larger and more brittle.

New problem. Cached data is out-of-date. Solution: create a web socket and tie into Hibernate’s interceptor system to broadcast notifications whenever entities are persisted. Bunch of new code for handling notifications and updating the relevant caches.

The Hibernate-friendly solution to these problems is as follows:

It’s possible to annotate the model classes to hide Hibernate proxy properties, so that Hibernate proxies can be converted to JSON, even with very old versions of Jackson and RESTeasy
Remove the lazy="false" or add fetch="subselect", depending on whether it is actually a performance improvement to fetch immediately
Replace generic fetches of objects with JQL queries using INNER JOIN FETCH that obtain exactly the entities needed at the time they are needed
Add foreign key indexes to the database, and others as-needed
Delete the cache layer, since it may be hiding performance issues
Delete the web socket notification layer, or make it induce a reload of the page rather than tinkering with cached values that may not even be displayed

However, I have a more holistic approach I would like to recommend, because the current scheme looks something like this:

stupid model with lots of MVC

This is stupid. You’re doing a lot of changing the shape of data, but it’s you in the front-end, it’s you in the back-end and it’s you in the database. So just admit that the person wearing the front-end hat is the same person wearing the back-end hat and the person writing the database, and let each of these components do the part of their job they are good at:

less stupid model with only one MVC

There. Now your back-end can be honest and have the SQL, yes SQL, it needs to build exactly the views that your front-end provides. Hibernate doesn’t need to be involved here because it isn’t buying you anything to take your fat objects out of the database at great cost, transmit them in all their bulk to the front-end, have them taking up space in the user’s browser (which is probably the worst at managing the memory) only to have your Angular front-end iterate through them and render four properties in a table view. You’re spending in the database, fetching stuff you don’t need, spending in the back end to convert it from one format to another, and then doing the same conversion again in the front-end. Caching is not the solution here! Doing less work is the solution! Build a holistic solution rather than three incomplete pieces of a solution!

So think holistically. Use mybatis and actually learn some SQL. Make the database do the work of changing the shape of the data—that’s what it’s for! Make the back-end do things to the data: this is what brings action and processing to your data. And then, make the front-end just show the damn data! That’s all it needs to do: display the information and collect the user’s desires for processing and pass it back. THIS IS AN MVC SYSTEM ALREADY—it doesn’t help to make the model MVC, the view MVC and the controller MVC!

In conclusion, Hibernate is garbage.

Addendum

I should make it clear I have the utmost respect for the architect. The high-level ideas in this program are great. Their execution is essentially befitting of a prototype. I don’t think I would have come up with as powerful, general or interesting of a design if left to my own devices.

March 9, 2018

This is adapted from a comment I left on Hacker News.

Why don’t we replace SQL with some other, better query language, perhaps something “equally powerful” like get name, age -> (foos.id X bars.foo_id) |> get first(name) |> head?

Part of the point of SQL is that the database may choose to satisfy your query in a very different manner than you might expect given the procedural reading. The example query is a nice illustration; a modern relational database may move project/select steps earlier in the processing as long as it doesn’t change the declarative meaning. Doing this kind of rearranging with a pipeline-looking query language is going to surprise people because you’re giving the majority of people who think procedurally a false affordance.

Here’s a nice example of a false affordance from a blog I can’t read:

false-affordance

The handles on the left doors give you a clue how they are to be used. ON the right, they’re mounted on the wrong side of the door so they will be awkward and surprising to use.

By the way, the false affordance is the fundamental cause of the object-relational impedance mismatch: the ORM is trying to put a low-level wrapper around a higher-level idea. OO languages are still row-by-row procedural systems where relational databases are set-oriented.

If the target language is Haskell or Ruby or another “sufficiently DSL-capable language” it will be possible to make an internal DSL that encapsulates your query. However, in that case I think users will be surprised when they have either non-obvious limits to how they can intermix query with in-language code, or else bad database performance compared to Postgres. You can see a little of both in the story of RethinkDB. If you are not using an internal DSL, you’ll be stuck in the same situation as SQL where you are embedding your query language in the code somehow.

Relational databases are not just storage technology with a complicated query language. They are also integration technology. SQL’s surface syntax may be unfortunate, but I’m increasingly doubtful that there is going to be a serious general-purpose database competitor that manages to hit the same notes only better. The main contender from the 90s was OODBs; they managed to have both shitty performance and lacked the ecosystem that makes SQL so useful as an integration technology: your application may be the main actor, but there are usually a large number of supporting cast members like cron jobs, reports, little scripts, ad-hoc queries, backup and restore etc, and having to do all of that with an OODB is very painful.

There are now and will continue to be niches where relational databases suffer, the prominent one today being distributed data storage. But the problem you think is a problem really isn’t a problem. Industrially speaking, “I don’t like SQL” is a non-problem. For people who hate it, there are ORMs that make it possible to ignore it most of the time for better or worse (mostly worse, but whatever). Syntactically, the main problem with it is that different dialects that behave slightly differently. This turns into a major engineering annoyance, but one that is mostly swallowed by our library dependencies, who have by now already basically taken care of it.

The benefit of using a modern relational database (I’m talking about Postgres but it applies to SQL Server and probably others as well) is they already have hundreds or thousands of man-years of effort put into optimizing them. I really thought RethinkDB had a brilliant design and it was going to be the next generation database. But it performed worse than Postgres and that makes it basically a non-starter. This is part of why niche databases are not based on SQL: if you handle a niche well, I will tolerate your oddities, but if you want to build a new general-purpose database today, in 2018, you can’t just have a better design. Your better design has to come with better performance and the ecosystem, the tool support, maintainability, etc., or it’s a non-starter. Databases are one of the hardest technical markets to break into. For most companies, the data is actually the most important thing.

March 9, 2018 database

I got one of these things for Christmas from my wife’s family secret santa:

Smokey Mountain Cooker 14”

The way this thing works is pretty interesting. At the bottom are some coals. You put wood on those coals, and they smoke. At the top is your food. In between is a bowl of water. The water acts as a moderator for temperature and as a humidifier.

So I’ve now smoked meat on this thing twice, in the dead of winter. The first time, I did a whole chicken, and it came out looking like this:

smoked chicken

I was a little intimidated at first, but I watched a bunch of videos and everything turned out great! I broke all the rules, looked at the food too many times (every half-hour the first time, every hour the second time), fussed with the coals too many times (well, once, the first day), bought and used the wrong charcoal (lump, should have just got the regular stuff) and the wrong wood (chips, should have got chunks), and it was still moist and delicious. Frankly, smoking is a lot easier than grilling!

If you have space, you should get one of these things. They’re easy to use. They take forever, but you really don’t have to screw around with them that much once you set them up.

Here’s my rub recipe: equal parts salt, sugar, cumin, garlic, oregano and New Mexican red chile powder. I admit I did use a dash or two of MSG (it’s fine, really) and some ground anise, but I don’t have the amount right. I think another equal part of it would be too sweet, but I did about a half portion of it and that wasn’t quite enough.

Anyway, it’s been very fun and I recommend you try it if you like smoked meat.

December 31, 2017

Reification is a concept that gets realized in programming in a handful of different ways.

1. Reification in RDF

In RDF, reification takes claims like “John is a tool” and adds in the contextual information about who is making the claim, turning them into claims like “Dave says John is a tool.” This is useful because RDF documents don’t have to have authority statements within them; if you obtain one from me, you may want the effect of reading it to be something like “Daniel says contents of daniel.rdf.” rather than just the contents of my file.

2. Reification in object-oriented terms

Often during the evolution of a program, a method winds up sort of taking over a class and acquiring a large number of parameters. Something like this:

class Plotter:
  def plot_schedule(schedule): ...

evolves into something a little worse like this:

class Plotter:
  def plot_schedule(schedule, weather, program, authors, start, stop, ...): 
    ...

At some point you have to package up that whole method into an object, which will usually have the word “request” in the name, and then the method will take one of these requests instead of a billion parameters:

class Plotter:
  def plot_schedule(schedule_request):
    ...

class ScheduleRequest:
  def set_schedule(schedule): ...
  def set_weather(weather): ...

What’s happened here is that the arguments to plot_schedule have been reified into an object called ScheduleRequest. The process could be taken as far as moving the actual plot method onto the schedule request. Maybe the plotter should just have lower-level methods anyway, and then ScheduleRequest becomes something more specific like SchedulePlot.

In general, converting verbs into nouns is a kind of reification.

December 19, 2017

← Newer Page 5 of 13 Older →