Englishy "Phonetic" Translation of Hebrew

Posted by Daniel Lyons Mon, 20 Nov 2006 06:19:22 GMT

It occurred to me today that Hebrew pronunciation is very regular and it would not be particularly difficult to implement a program that takes UTF-8 Hebrew text in and transliterates it to English. For example:

”שלום עליכם” -> “shalom aleiḥem” (that should look like an h with a dot underneath it, in case you have a weird browser).

Naturally, I already have a graceful recursive algorithm in mind for doing this, but I actually have no idea which of my permitted languages supports Unicode properly.

Here’s my guess:

  • Common Lisp: probably supported, but probably gross, like everything else in CL.
  • OCaml: should be supported (Europeans), but isn’t native.
  • Haskell: no clue, probably not native at least. Maybe Hugs98?
  • Erlang: not supported at all, due to underlying string implementation (lists of integers). I could mess around with the binary directly, using binary pattern matching to grep out the Hebrew and produce, say, atoms. Hmm.

I’ll have to look at this some more. It would be helpful to have in general I think.

Tags , , , , , , , ,  | 7 comments

Comments

  1. Avatar pedrokolari@arnet.com.ar said 5 months later:

    I have no technical knowledge. But I am looking for a site showing Hebrew and English texts (for example and especially newspapers published in both languages) with a cursor prompted transliteration/translation for each Hebrew word. I know it can be done. Would appreciate help in getting there. If it’s not available, I believe creating it would be a tremendous contribution to expanding Hebrew knowledge / access to Hebrew texts.

  2. Avatar Daniel said 11 months later:

    Hi Daniel (great name BTW), Was thinking of implementing such an algorithm in Java – was wondering if you made any progress in this field.

    Thanks in advance, Danny.

    danny@cephx.com

  3. Avatar Reggie Drake said about 1 year later:

    The CL standard says nothing about Unicode, but its character type is defined in a very general way, and most major implementations nowadays support Unicode in a completely transparent and pleasant way.

    Haskell (GHC, at least) internally uses Unicode, but reverts to the 8-bit-per-character model when doing I/O (easily fixed with a library).

  4. Avatar Jacob said about 1 year later:

    Hello Mr. Lyons,

    I am interested in having a phonetic translator for Hebrew-English translation. on my laptop. I was wondering sense it’s been 1-2 years sense you first put this up if you either A. made this yourself and didn’t mention it or B. Found one that was good and reliable. or C. forgot about it or D. (__) < insert smart and cool reason here.

    P.S. it’s 3:20 in the orning i don’t drink coffe and I’m recovering from a nice long hiking trip last week… if any of what i said makes sense please let me know either way.

    P.S.S. I liked reading your blog so far (first 3-5 pages or so) hope everything goes well for you.

    Thank you for your time, Jacob ASPL of the T-Birds

  5. Avatar http://www.drk.com.ar said about 1 year later:

    My knowledge about Hebrew is basic. Is there any way of getting the vocals from the text without the nikudots?

  6. Avatar Erez said about 1 year later:

    Hi there. I could really, really(!) use such a tool. Any progress done with that? Do you happen to have a tranlation table for hebrew -> English (phonetic)? If you have any of those, please contact me at erez@non-stop.co.il Cheers :D

  7. Avatar hennieretief@hotmail.com said about 1 year later:

    כי אני מוכן ומזומן

(leave url/email »)

   Comment Markup Help Preview comment