Touhou 1M GET!

Romaji converter test

Posted under General

It romanizes 葉月 as "haduki". I see storm clouds on the horizon... ;)

It seems to have some problems with conjugation and spacing - for example, させてしまえば becomes "sa se te shimae ba", whereas it should ostensibly be "sasete shimaeba". But since this will mainly be used for trying to romanize names (right?), it shouldn't really be a problem.

It handles リュ just fine though. Chouon (ー) is broken in hiragana only as well; さー comes out as "sa -", whereas サー comes out as "saa". Nonstandard morae such as ディ or トゥ are handled wrongly as well (dei and tou instead of di and tu). Well, assuming that we want ディ and トゥ romanized as "di" and "tu", I don't know. I'm just fine with "dhi" and "twu", but "dei" and "tou" are definitely wrong.

0xCCBA696 said: Nonstandard morae such as ディ or トゥ are handled wrongly as well (dei and tou instead of di and tu). Well, assuming that we want ディ and トゥ romanized as "di" and "tu", I don't know.

Generally speaking that kind of thing is used when approximating onomatopoeia/sound effects or foreign words, right? So I think it would depend entirely on context.

Almost exclusively in approximating foreign words, I should think. Sound effects are rather standardized and make use of the native Japanese syllabary. Of course, "approximating" can also mean "making up", in which case... well.

Is it really necessary for it to romanize numbers already in roman numerals? I could see that being useful in occasion, but I can also imagine it causing a bit of clutter in others, especially since it isolates them (e.g. "17" is romanized as "ichi nana" instead of "juu nana"). Could an "ignore numbers" function be coded?

Overall, it's a failure. It runs afoul our romanisation scheme, it trips over on the slightest irregularity, has terribly useless failure modes and can't handle names anyway, which'd be the primary use. Not to mention that people needing the assistance of the tool won't be able to judge the quality of the output. Let's just stick with opening forum threads.

I've switched the backend to use kakasi instead of chasen. It behaves a little more intelligently now.

And are you guys seriously offering to translate often inscrutable Japanese names on a potentially daily basis? Perhaps I should just make the converter send an email.

Hm, that's a bit more promising. The new engine gives out multiple possibilities in the form {a|b|c} for kanji it can't figure out. 多宇宇 comes out as {ta|oo}{u|taka}{u|taka}, for example. As LaC and 葉月 pointed out, names are impossible to guess by algorithm in general since kanji often have 名乗り readings that are only used in names, and in fact kanji can be used to represent any pronunciation you like, as long as you tell people. Online that may be considered unnecessary.

Perhaps it's kind of similar to the problem of getting a computer to figure out how to pronounce arbitrary names from arbitrary languages. You can make a best guess at how the name would sound according to, say, English orthography, but there's no guarantee that it would be correct or even close.

Indeed, this is not something a program can do on its own: 葉月 is going to have to handle it. What you can do is write a program to assist him, but to do so, you're going to have to look at the entire process a human uses to figure out the intended reading. Looking up possible readings of the characters is just the first step. Other steps might be:

- enamdic lookup
- looking up people with that name on wikipedia (which always lists readings)
- multiple google searches to compare the frequency of the various candidates.

These are all things that can be automated. Ideally, a single button would produce a page collecting all the data points 葉月 can use to determine the most likely reading.

I don't see why 葉月 need be the gatekeeper. Yeah, automatic translation is highly unlikely to be 100% accurate, especially in this case. Yeah, building a translation aide for humans rather than an end-all-be-all translator is smarter. But why not make that aide available to anyone who is probably already using similar techniques to do translations? Filtering everything through one person seems like a bottleneck we could do without.

Shinjidude said:
I don't see why 葉月 need be the gatekeeper. Yeah, automatic translation is highly unlikely to be 100% accurate, especially in this case. Yeah, building a translation aide for humans rather than an end-all-be-all translator is smarter. But why not make that aide available to anyone who is probably already using similar techniques to do translations? Filtering everything through one person seems like a bottleneck we could do without.

This one's a nice alternative if you're looking for something to help you in romanizing.
http://www.kawa.net/works/perl/romanize/roman-demo-e.html

I'm not specifically looking for anything, since I have my own system. What I'm saying is that if Albert wants to write a translation aide, it's not harmful, and the system need not simply e-mail 葉月. The people using it just need to be aware of its limitations and know how to pick intelligently when it comes to deciding between options.

Thanks for the link though, I'll add it to my list of resources.

If I had to choose between handling help request daily and fixing silent corruption introduced by people faced with the task of making out something of "{en|on|too}{saka|zaka|han|ban}" (hint: it's 遠坂), I think it's more time-effective to handle request. Seriously, introducing "aides" as confusing as this will only cause difficult to detect damage. I'm all for a streamlined UI for getting human help, because ultimately it's gonna take *fewer* man-hours than fixing the stupidities induced by automated tools' suggestions. Case in point:

Quess said:
This one's a nice alternative if you're looking for something to help you in romanizing.
http://www.kawa.net/works/perl/romanize/roman-demo-e.html

It offers "nippongo" as the first and only reading for 日本語. Seriously, WTF. And that's the goddamn example text they give. It's like those pics featuring engrish text where you can see someone ran all the words through a dictionary without being able to judge the quality of the suggestions, resulting in a sentence that uses both "ululate" and "cormorant".

葉月 said: It offers "nippongo" as the first and only reading for 日本語. Seriously, WTF. And that's the goddamn example text they give.

Getting off topic I guess, but it gives me both nippongo and nihongo:

nihongo
nippongo
日本語

1 2