I18n Puzzles, Day 2

This is the kind of problem where you could load the whole thing into Postgres and get the answer in about five seconds. In fact, let’s try it:

postgres# create table i18n_day2 (input timestamp with time zone);
CREATE TABLE
postgres=# \copy i18n_day2 from /i18np/input.txt
COPY 1758
postgres=# select input at time zone 'UTC', count(*) from i18n_day2 group by input at time zone 'UTC' having count(*) >= 4;
      timezone       | count
---------------------+-------
 20XX-YY-ZZ HH:MM:SS |     4
(1 row)

I decided to censor the output a little bit and I didn’t handle the formatting properly there because that’s not really the point.

Suffice to say, this is kind of a non-problem in languages with time zone support. Unfortunately, neither J nor APL have time zone support in the standard library, so we’ll have to figure it out on our own.

The first problem is that we have to parse these dates: 2019-06-05T08:15:00-04:00. These happen to be fixed-width. There are snappier ways of parsing but I decided to narrow in on this element of it.

My plan here is to handle the date+time part first, since there is a library built-in for this (todayno and todate, which seem like they should be inverses of each other but are not some reason). We can parse the time into a similar structure and expand it using expand. Then we add them, or actually subtract them (I realized this a little late).

I felt like I wanted to see a fork of the form f + g since the idea is to parse most of the timestamp and then the offset. The amount of work to handle a fixed-width format was not insubstantial, but I came up with these functions:

   dp =. (_ 2 1 $ 0 4 5 2 8 2 11 2 14 2 17 2) 0&".;.0 ]
   tp =. 0 0 0 1 1 0 #inv (_ 2 1 $ 19 3 23 2) 0&".;.0 ]

These two functions parse the date part and time part. Probably bad names. The key idea about using ;.0 is to take a substring of a given length at a given offset. So starting from 0 with length 4 gets us the year, this is the 0 4; then we get the month from offset 5 length 2, which is 5 2 which comes next. All six of the chunks of data we need are thus specified by the 12 items in the list; we convert these into an array of 6 2x1 vectors with $. This feeds the subarray ;.0 verb. We’re adding in 0&". to parse numbers; regular ". runs J code, but we just want the values.

The ever-friendly and wise elcaro on the J channel of the APL Farm Discord suggested using these predicates instead:

Nats =: '1234567890'&(i. ".@:{ ' ',~ [)
Nums =: '1234567890._ '  ".@:{~ '1234567890.-' i. ]

Which was really tempting since you can then do all the parsing with this kind of expression:

   t =. '2024-08-01T08:15:03-03:15'
   19 (Nats@{. , Nums@}.) t
2024 8 1 8 15 3 _3 15

Which is really hot, but I insisted on doing it the hard way for some reason.

Now my plan is to normalize” the timestamp, by converting this from a 6 item array to an internal date and back, and then throwing it into a printing function. First the printing function:

   require 'format/printf'
   
   dt =. '%04d-%02d-%02dT%02d:%02d:%02d+00:00' vsprintf

Nothing interesting here. Now my goal is to avoid boxing and pass lines through a function which does the work here. That function will do the normalization” I mentioned above:

   norm =. [: dt 2 todate 2 todayno dp - tp

There’s the fork I was thinking about. I read another article (about ray tracing in J) which explained that the cap [: is about converting a dyadic function to a monadic one for forks.

The use of [: is a little weird to understand, but it is basically a no-operation left argument to ensure that the verb is evaluated as a function of one argument instead of two.

This seems like a decent explanation. So the idea here is to handle the time zone data with the date part, convert that to a day number, then convert that back to a date, then format it. The conversion handles the possibilities of negative times and whatnot.

Another approach would have been to instead convert the first number to a day number” and then convert the hour and minute values to fractions of a day. In trying that, I saw odd behavior so I decided this might work alright.

OK, so now we have the verb that will parse, but we still need to actually do the puzzle. The first piece is to use norm;._2 fread <filename>. Using norm with ;._2 is how we’re going to avoid boxing; we’ll get an array of normalized timestamps instead of boxed strings or whatever. But the puzzle question is to find the times that appear most frequently. This is not all that different from the word frequencies problem. So I wound up using key /. with length # on the normalized timestamps, sorting by that, and applying that sort order to the nub ~. of the timestamps. Taking the first item of that list yields the timestamp we are interested it:

   {. (~. nm) \: #/.~ nm =. norm;._2 fread 'test-input.txt'

And this is our solution. The entire thing is:

   require 'format/printf'
   
   dt =. '%04d-%02d-%02dT%02d:%02d:%02d+00:00' vsprintf
   dp =. (_ 2 1 $ 0 4 5 2 8 2 11 2 14 2 17 2) 0&".;.0 ]
   tp =. 0 0 0 1 1 0 #inv (_ 2 1 $ 19 3 23 2) 0&".;.0 ]
   norm =. [: dt 2 todate 2 todayno dp - tp
   {. (~. nm) \: #/.~ nm =. norm;._2 fread 'input.txt'

Tags
j

Date
June 2, 2025