Thursday, February 23, 2012

Moses: recaser issues

Nowadays, I am trying to put up an Moses-based MT demo.
I found that the moses/scripts/recaser/recase.perl actually does a lot of things other than using Moses to translate uncased text to cased text:
(1) by default the moses.ini configuration file of the MT system for recasing uses distortion-limit 6, which means it allows reordering, and the recase.perl script changes the distortion-limit to 1 by passing the option "-dl 1" to the Moses decoder.
(2) the recase.perl script also use some rules to do recasing, e.g., for English, it will always keep some specific words ("a","after","against","al-.+","and","any","as","at","be","because","between","by","during","el-.+","for","from","his","in","is","its","last","not","of","off","on","than","the","their","this","to","was","were","which","will","with") upper casing;
(3) the script also uppercases the initial word of a sentence.

No comments: