| Readme for SMTGUI |
| Philipp Koehn, Evan Herbst |
| 7 / 31 / 06 |
| ----------------------------------- |
|
|
| SMTGUI is Philipp |
|
|
| newsmtgui.cgi is the main program. Corpus.pm is my module; Error.pm is a standard part of Perl but appears to not always be distributed. The accompanying version is Error.pm v1.15. |
|
|
| The program requires file |
|
|
| For the corpus with name CORPUS, there should be present the files: |
| - CORPUS.f, the foreign input |
| - CORPUS.e, the truth (aka reference translation) |
| - CORPUS.SYSTEM_TRANSLATION for each system to be analyzed |
| - CORPUS.pt_FACTORNAME for each factor that requires a phrase table (these are currently used only to count unknown source words) |
|
|
| The .f, .e and system-output files should have the usual pipe-delimited format, one sentence per line. Phrase tables should also have standard three-pipe format. |
|
|
| A list of standard factor names is available in @Corpus::FACTORNAMES. Feel free to add, but woe betide you if you muck with |
|
|
| Currently the program assumes you |
|
|
| $ $BIN/tag-english < CORPUS.lc > CORPUS.pos-tmp (call Brill) |
| $ $BIN/morph < CORPUS.pos-tmp > CORPUS.morph |
| $ $DATA/test/factor-stem.en.perl < CORPUS.morph > CORPUS.lemma |
| $ cat CORPUS.pos-tmp | perl -n -e |
| $ $DATA/test/combine-features.perl CORPUS lc+pos lemma > CORPUS.lc+pos+lemma |
| $ rm CORPUS.pos-tmp (cleanup) |
|
|
| where $BIN=/export/ws06osmt/bin, $DATA=/export/ws06osmt/data. |
|
|
| To get German POS tags and lemmas from a words-only corpus (the first step must be run on linux): |
|
|
| $ $BIN/recase.perl --in CORPUS.lc --model $MODELS/en-de/recaser/pharaoh.ini > CORPUS.recased (call pharaoh with a lowercase->uppercase model) |
| $ $BIN/run-lopar-tagger-lowercase.perl CORPUS.recased CORPUS.recased.lopar (call LOPAR) |
| $ $DATA/test/factor-stem.de.perl < CORPUS.recased.lopar > CORPUS.stem |
| $ $BIN/lowercase.latin1.perl < CORPUS.stem > CORPUS.lcstem (as you might guess, assumes latin-1 encoding) |
| $ $DATA/test/factor-pos.de.perl < CORPUS.recased.lopar > CORPUS.pos |
| $ $DATA/test/combine-features.perl CORPUS lc pos lcstem > CORPUS.lc+pos+lcstem |
|
|
| where $MODELS=/export/ws06osmt/models. |
|
|