MySQL vs. PostgreSQL

Posted by Daniel Lyons Tue, 19 Dec 2006 06:33:15 GMT

A lot has been made of MySQL and PostgreSQL lately. Now, just to be bitchy, I thought I might chip in my two cents about the deeper problems.

One thing which consistently annoys me about MySQL is that it is a very leaky abstraction. PostgreSQL has three integers, defined in the SQL spec. MySQL has six, and the UNSIGNED modifier for 12, plus a ZEROFILL option. To me, and most database people, the whole point of databases is to abstract away complexity. MySQL encourages you to worry about low-level details.

Other features of MySQL seem to be added for convenience to C programmers or UNIX people. For example, it handles bash-style # comments, for no particular reason. UNSIGNED also strikes me as the C level bubbling on up to interfere with my life at the DBA level.

But it also reveals a general MySQL propensity to modify the grammar. For example, in SQL, there is no NULL modifier. But the MySQL guys added it for parity with NOT NULL. NULL is the default, so PostgreSQL doesn’t provide it. There are other bizarre additions to the syntax ostensibly for convenience, such as REPLACE INTO. Other aspects of the grammar feel very haphazard in MySQL. Compare MySQL’s INSERT syntax:

INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
    [INTO] tbl_name [(col_name,...)]
    VALUES ({expr | DEFAULT},...),(...),...
    [ ON DUPLICATE KEY UPDATE col_name=expr, ... ]

or

INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
    [INTO] tbl_name
    SET col_name={expr | DEFAULT}, ...
    [ ON DUPLICATE KEY UPDATE col_name=expr, ... ]

or

INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
    [INTO] tbl_name [(col_name,...)]
    SELECT ...
    [ ON DUPLICATE KEY UPDATE col_name=expr, ... ]

To PostgreSQL’s:

INSERT INTO table [ ( column [, ...] ) ]
    { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
    [ RETURNING * | output_expression [ AS output_name ] [, ...] ]

It may amuse you to know that I picked INSERT arbitrarily, because I didn’t want to show all the weird crap going on in the CREATE statement. Note that in the PostgreSQL syntax above, the ”| query” has all the same expressive power as the SELECT portion of th e MySQL syntax. But also look at what isn’t there in the PostgreSQL syntax: ON DUPLICATE KEY, and the whole wacky priority thing. MySQL is full of strange things like that. All of MySQL seems to suffer from this, to me, outrageous level of complexity.

Another thing that gets me is the whole MyISAM / InnoDB difference. Supposing the difference weren’t being used to cheat at benchmarks, it’d still be annoying to have to decide whether one wants data integrity or speed for a whole collection of data. But the fact is, it’s also being used to cheat in comparisons to PostgreSQL, by comparing PostgreSQL’s speed to MyISAM and feature set to InnoDB. Either way, it’s the abstraction leaking up to the surface leading me to ask the client questions like, “So… how much data do you think you’re going to have to store?”

Then there are other features of MySQL that are just plain weird. For example, the TIMESTAMP type changes behind your back to the last time a row is updated, even if you don’t modify the column containing the TIMESTAMP. They offer a DATETIME that doesn’t do this but contains the same amount of information. They say, if you don’t want this to happen on a particular UPDATE, you can set the column to … itself. This is all very fishy to someone accustomed to databases with triggers. Another example: if you use a function in the DEFAULT of a column definiton, MySQL executes it once, when you create the table. PostgreSQL executes it when you insert a row without that column defined. Again, fishy.

Anyway, I recommend people use PostgreSQL. Getting used to it is a little weird at first but, like most well-engineered things, it becomes clear over time how it all fits together. One never has that experience with MySQL.

Tags , ,  | 2 comments

FreeBSD and PostgreSQL

Posted by Daniel Lyons Fri, 29 Sep 2006 06:52:59 GMT

Bill and I had quite a bit of downtime today, due to two problems:

  1. Gentoo isn’t maintained as well as it used to be, and permitted Bill to update a package (OpenLDAP) which has no backwards compatibility regarding the database format.
  2. When you use LDAP, even if everything speaks PAM (which it does), it winds up linked against LDAP.

We’re thinking about migrating two things:

  1. The whole system away from LDAP, probably to PostgreSQL or (shudder) regular flat files.
  2. The OS from Gentoo to FreeBSD.

I personally would like to see both changes made, since I think FreeBSD is stabler and more well-maintained. However I think keeping ports up-to-date in FreeBSD is somewhat harder than keeping Gentoo up-to-date, but I’m not really sure why I feel that way; I guess going back to my early experiences with 4.6 and 5.0 which had a wonkier package system.

I also have learned something important about LDAP: it’s a total fucking sham. I remember this was my initial opinion of it, and I spent a couple years at Matterform thinking maybe it is a good idea because everything can utilize it, but, though that is true, I simply don’t need it, ever, and it is bulky, slow, inefficient with CPU, disk and RAM, and very fragile. Most of the benefits of it can be realized with PAM and PostgreSQL. In fact, the only benefits you don’t get are:

  1. Software browsing of the directory (Apple’s AddressBook and Mozilla Mail can do this)
  2. Simple replication via slurpd (though Slony-I can replicate PostgreSQL in a similar single master/many slave configuration)

The rest of it can be emulated via PAM and nsswitch. There’s a sysauth project for PostgreSQL which does both and they’re both post-1.0.

And if you still believe the horseshit about LDAP being better than a database, here are two little factoids to correct you:

  1. TinyLDAP is 7 times faster than indexed OpenLDAP directory, and more than 37 times faster than a default install of OpenLDAP, which has no indices.
  2. My 1.5 page SQL query from work, which PostgreSQL’s query optimizer says should cost 10,000 “points”, is delivered in a fraction of a second.

When a directory asshole tells you directories are optimized for reading and databases are not, you can mention that. I also find it incredibly annoying that the OpenLDAP people insist that you’re not comparing OpenLDAP to PostgreSQL fairly unless you put a lot of time and effort into tweaking OpenLDAP. Then and only then do you see similar performance to out-of-the-box PostgreSQL. Of course, PostgreSQL being a database, you can also tweak the hell out of its performance.

Screw the maintsream.

Tags , , ,  | 1 comment