Monday, September 26, 2011

Installing latest Moses

Today, I have installed the latest Moses decoder (revision 4274) again.

Since lots of volunteers have contributed to Moses development, and lots of new features are added by them, the Moses is becoming more and more complicated. As a results, there are more bugs or incompatible issues in Moses now, which also implies that it is more difficult to install Moses.

I am writing this post to record a successful installation process of Moses, which can be very useful for a starter of Moses:

The first step is to run command ./regenerate-makefiles.sh:
Detected aclocal: aclocal (GNU automake) 1.11.1
Detected autoconf: autoconf (GNU Autoconf) 2.64
Detected automake: automake (GNU automake) 1.11.1
Detected libtoolize: libtoolize (GNU libtool) 2.2.6
Calling /home/w/wangpd/local/bin/aclocal...
Calling /home/w/wangpd/local/bin/autoconf...
Calling /home/w/wangpd/local/bin/automake...
Calling /home/w/wangpd/local/bin/libtoolize
Detected 16 cores

You should now be able to configure and build:
./configure [--with-srilm=/path/to/srilm] [--with-irstlm=/path/to/irstlm] [--with-randlm=/path/to/randlm] [--without-kenlm] [--with-synlm] [--with-xmlrpc-c=/path/to/xmlrpc-c-config]
make -j 16


The second step is to run command:
./configure [--with-srilm=/path/to/srilm] [--with-irstlm=/path/to/irstlm] [--with-randlm=/path/to/randlm] [--without-kenlm] [--with-synlm] [--with-xmlrpc-c=/path/to/xmlrpc-c-config]
, where you really need absolute pathes for all the options.

The latest Moses requires IRSTLM whose version should be not older than 1.70.01, which is what you have to do, otherwise you will fail (I tried 1.50, and failed). Another important point is that you must finish the installation of IRSTLM completely, which means you need to run:
bash regenerate-makefiles.sh
# set parameter force to the value "--force" if you want to recreate all links to the autotools
./configure --prefix=$PWD
# run "configure --help" to get more details on the compilation options
make
make install
in the root directory of IRSTLM. Note that the last command make install is absolutely needed, since I have tried to skip it but of course failed.

The last step is to run make -j 4 .

Friday, September 16, 2011

an error for Moses when decoding large lattices

Once when I was using Moses to decode a large lattice, I got the following error:
ERROR: Jump length 32 in word lattice exceeds maximum phrase length 20.
ERROR: Increase max-phrase-length to process this lattice.

After looking at the input lattice, I have found that I have a node in the lattice which wanted to jump to the 32nd node after it.

Following the error message I have fixed this problem by setting the -max-phrase-length 35 option for the moses decoder.

Wednesday, September 7, 2011

A bug in the Moses tokenizer

when you set the -l option as an unknown language, the Moses tokenizer will say it will fall back to English. However, it does not completely fall back to English. It only falls back to English for tokenizing the period (.) issues, but it will tokenize the single quotation marks (') differently from the English case.

for example, given the input "I'm a boy.", if you set -l en or do not set the -l option, the output is "I 'm a boy ."; if you set -l abc which is an unknown abbreviation of language, the output will be "I ' m a boy ."

Sunday, September 4, 2011

Linux: compare text files at the word level

wdiff is a good choice, which can be found online:
http://www.gnu.org/s/wdiff/

what you need to do is to download it and compile it.