ヒンディー語翻訳システムが本格的に稼働開始

http://www.usc.edu/isinews/stories/98.html
http://www.wired.com/news/technology/0,1282,59093,00.html

While for most European languages, there are one or two predominant standardized ways of encoding them, e.g."Latin-1" or Unicode, Hindi has a wildly mixed potpourri of encodings.

"It's ridiculous," said Germann, "almost every single Hindi language web site has its own encoding." Tools had to be made to convert all of these various systems to a single common one to present parallel texts to Och and other machine translation experts.

"Most of the conversion work was done by our partners at other participating sites, and it was absolutely critical to the success of the exercise," Germann said.

ヒンディー語サイトはサイトごとにエンコーディングがばらばらでそれを統一する作業がとても重要になってくるが、統一したエンコーディングとはいったい何なのか。itransだろうか。