| # N-best List Re-Scorer | |
| Written by Michael Denkowski | |
| These scripts simplify running N-best re-ranking experiments with Moses. You | |
| can score N-best lists with external tools (such as models that would be very | |
| costly to integrate with Moses just for feasibility experiments), then use the | |
| extended feature set to select translations that may be of a higher quality than | |
| those preferred by the Moses features alone. In some cases, training a | |
| re-ranker even without any new features can yield improvement. | |
| ### Training | |
| * Use Moses to generate large N-best lists for a dev set. Use a config file | |
| (moses.ini) that has been optimized with MERT, MIRA, or similar: | |
| ``` | |
| cat dev-src.txt |moses -f moses.ini -n-best-list dev.best1000.out 1000 distinct | |
| ``` | |
| * (Optionally) add new feature scores to the N-best list using any external | |
| tools. Make sure the features are added to the correct field using the correct | |
| format. You don't need to update the final scores (right now your new features | |
| have zero weight): | |
| ``` | |
| 0 ||| some translation ||| Feature0= -1.75645 Feature1= -1.38629 -2.19722 -2.31428 -0.81093 AwesomeNewFeature= -1.38629 ||| -4.42063 | |
| ``` | |
| * Run the optimizer (currently K-best MIRA) to learn new re-ranking weights for | |
| all features in your N-best list. Supply the reference translation for the dev | |
| set: | |
| ``` | |
| python train.py --nbest dev.best1000.with-new-features --ref dev-ref.txt --working-dir rescore-work | |
| ``` | |
| * You now have a new config file that contains N-best re-scoring weights: | |
| ``` | |
| rescore-work/rescore.ini | |
| ``` | |
| ### Test | |
| * Use the **original** config file to generate N-best lists for the test set: | |
| ``` | |
| cat test-src.txt |moses -f moses.ini -n-best-list test.best1000.out 100 distinct | |
| ``` | |
| * Add any new features you added for training | |
| * Re-score the N-best list (update total scores) using the **re-scoring** | |
| weights file: | |
| ``` | |
| python rescore.py rescore-work/rescore.ini <test.best1000.with-new-features >test.best1000.rescored | |
| ``` | |
| * The N-best list is **not** re-sorted, so the entries will be out of order. | |
| Use the top-best script to extract the highest scoring entry for each sentence: | |
| ``` | |
| python topbest.py <test.best1000.rescored >test.topbest | |
| ``` | |
| ### Not implemented yet | |
| The following could be relatively easily implemented by replicating the | |
| behavior of mert-moses.pl: | |
| * Sparse features (sparse weight file) | |
| * Other optimizers (MERT, PRO, etc.) | |
| * Other objective functions (TER, Meteor, etc.) | |
| * Multiple reference translations | |