Internationalization Puzzles in J, Day One

I just found out about these i18n puzzles and figured I’d take a crack at one in J. The first one is pretty easy. I’m also trying to apply my learning about linear algebra to the domain.

The crux of this puzzle is the following observation about a string:

  • For SMS, the byte count matters, and must be under 160 bytes
  • For Twitter, the character count matters, and must be under 140 characters

The input format is just a list of strings, and your mission is to calculate the cost. I glanced over this at first and made a mistake in my zeal to apply some really rudimentary linear algebra. Basically, I thought, let’s convert the input into a big matrix, we’ll have a byte count b1b_1 and a unicode length u1u_1 for each, and this will become our input matrix:

[b1<160u1<140b2<160u2<140b3<160u3<140]

This will become a matrix of 0s and 1s, which we can then just take the dot product by the costs matrix:

[$0.07$0.11]

The entire problem should reduce to something like this:

[00110110][$0.11$0.07]=[$0.00$0.18$0.07$0.11]

If I had done this step on paper or something, I would probably have figured out the mistake, but I didn’t until later.

Let’s translate that to J. First we need to read the file, which will be using input =. cutLF fread filename. This gives us boxed strings, which is fine. Now we need to use # to get the length and ucpcount to count Unicode characters. We can throw these together as a train with , to get both at once:

   (# , ucpcount) each input
┌───────┬───────┬───────┬───────┐
│162 143│138 136│253 140│147 141│
└───────┴───────┴───────┴───────┘

These are the very values in the problem page, so this appears to be on the right track.

Then I hit a little snag with trains, because I wanted to write it like (160>:# , 140>:ucpcount) but this does not do what it feels like it should, on account of the strict left-to-right order. So I wrote it like so instead:

   >((160>:#) , 140>:ucpcount) each input
┌───┬───┬───┬───┐
│0 0│1 1│0 1│1 0│
└───┴───┴───┴───┘

   >((160>:#) , 140>:ucpcount) each input
0 0
1 1
0 1
1 0

Now we have exactly the matrix I expected to have, so let’s try the dot product:

   (11 7) +/ .* |: >((160>:#) , 140>:ucpcount) each input
0 18 7 11

This is supposed to be 0 13 7 11? Oh right, in my excitement I forgot that I need to discount when they’re using both SMS and Twitter. I thought about this for a second and thought, I would really like to be able to index an array by another array. I’m not sure how that would work. But I also remember reading about a trick where you convert the two-dimensional index into a scalar by using encode #.. So instead of having a 2x2 table, we just have an array of length 4. In other words, 0 0 = 0, 0 1 = 1, 1 0 = 2, 1 1 = 3. Then I can encode the prices as 0 7 11 13, the price for nothing, a Tweet, an SMS, and both.

   (0 7 11 13) {~ 2 #. > ((160>:#) , 140>:ucpcount) each input
0 13 7 11

Now we can just make the entire solution:

   +/ (0 7 11 13) {~ 2 #. > ((160>:#) , 140>:ucpcount) each (cutLF fread'~/Downloads/test-input.txt')
31

And this solves the puzzle.

Edit: the helpful people on The APL Farm provided some advice. For starters, Time Melon points out that #. has a default left argument of 2, so we can simply remove the 2 there, and the parentheses around the prices can be removed, yielding this improvement:

   +/ 0 7 11 13 {~ #. > ((160>:#) , 140>:ucpcount) each input
31

Elcaro points out that each is creating boxes I am then removing, so we can simplify to this:

   +/ 0 7 11 13 {~ #. ((160>:#) , 140>:ucpcount) every input
31

or this; I’m undecided but leaning towards the shorter one because I didn’t realize every was a thing:

    +/ 0 7 11 13 {~ #. ((160>:#) , 140>:ucpcount) &> input
31

Elcaro also noticed that I’m missing out on the obvious fact that >: is repeated inside the major transformation, so we can simplify it further to this:

   +/ 0 7 11 13 {~ #. (160 140 >: #,ucpcount) &> input
31

   NB. Or directly
   +/ 0 7 11 13 {~ #. (160 140 >: #,ucpcount) &> cutLF fread '~/Downloads/test-input.txt'
31 

And this appears to me to be the final form!

Edit: Elcaro makes another suggestion, pointing out that cutLF is not that different from <;._2, and so we can actually remove the boxing altogether and simplify the solution slightly further to this:

   +/ 0 7 11 13 {~ #. (160 140 >: #,ucpcount);._2 input
31

Tags
j

Date
May 20, 2025