diff --git a/fairseq-0.10.2/.github/ISSUE_TEMPLATE.md b/fairseq-0.10.2/.github/ISSUE_TEMPLATE.md
new file mode 100644
index 0000000000000000000000000000000000000000..5c4c4493e4a8e5386b927e4f4554df925955d129
--- /dev/null
+++ b/fairseq-0.10.2/.github/ISSUE_TEMPLATE.md
@@ -0,0 +1,3 @@
+## 👉 [Please follow one of these issue templates](https://github.com/pytorch/fairseq/issues/new/choose) 👈
+
+Note: to keep the backlog clean and actionable, issues may be immediately closed if they do not follow one of the above issue templates.
diff --git a/fairseq-0.10.2/.github/ISSUE_TEMPLATE/bug_report.md b/fairseq-0.10.2/.github/ISSUE_TEMPLATE/bug_report.md
new file mode 100644
index 0000000000000000000000000000000000000000..a7f4f0a902e92a6b40e437ab496a50fdee4d6aae
--- /dev/null
+++ b/fairseq-0.10.2/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,43 @@
+---
+name: 🐛 Bug Report
+about: Submit a bug report to help us improve
+labels: 'bug, needs triage'
+---
+
+## 🐛 Bug
+
+<!-- A clear and concise description of what the bug is. -->
+
+### To Reproduce
+
+Steps to reproduce the behavior (**always include the command you ran**):
+
+1. Run cmd '....'
+2. See error
+
+<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->
+
+
+#### Code sample
+<!-- Ideally attach a minimal code sample to reproduce the decried issue. 
+Minimal means having the shortest code but still preserving the bug. -->
+
+### Expected behavior
+
+<!-- A clear and concise description of what you expected to happen. -->
+
+### Environment
+
+ - fairseq Version (e.g., 1.0 or master):
+ - PyTorch Version (e.g., 1.0)
+ - OS (e.g., Linux):
+ - How you installed fairseq (`pip`, source):
+ - Build command you used (if compiling from source):
+ - Python version:
+ - CUDA/cuDNN version:
+ - GPU models and configuration:
+ - Any other relevant information:
+
+### Additional context
+
+<!-- Add any other context about the problem here. -->
diff --git a/fairseq-0.10.2/.github/ISSUE_TEMPLATE/documentation.md b/fairseq-0.10.2/.github/ISSUE_TEMPLATE/documentation.md
new file mode 100644
index 0000000000000000000000000000000000000000..3a6e2e9ea4bb71102122c17ff53051eb3770cb5e
--- /dev/null
+++ b/fairseq-0.10.2/.github/ISSUE_TEMPLATE/documentation.md
@@ -0,0 +1,15 @@
+---
+name: 📚 Documentation/Typos
+about: Report an issue related to documentation or a typo
+labels: 'documentation, needs triage'
+---
+
+## 📚 Documentation
+
+For typos and doc fixes, please go ahead and:
+
+1. Create an issue.
+2. Fix the typo.
+3. Submit a PR.
+
+Thanks!
diff --git a/fairseq-0.10.2/.github/ISSUE_TEMPLATE/feature_request.md b/fairseq-0.10.2/.github/ISSUE_TEMPLATE/feature_request.md
new file mode 100644
index 0000000000000000000000000000000000000000..93c8668041f8a7af29e4c11e905d8b56b946dd51
--- /dev/null
+++ b/fairseq-0.10.2/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,24 @@
+---
+name: 🚀 Feature Request
+about: Submit a proposal/request for a new feature
+labels: 'enhancement, help wanted, needs triage'
+---
+
+## 🚀 Feature Request
+<!-- A clear and concise description of the feature proposal -->
+
+### Motivation
+
+<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->
+
+### Pitch
+
+<!-- A clear and concise description of what you want to happen. -->
+
+### Alternatives
+
+<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->
+
+### Additional context
+
+<!-- Add any other context or screenshots about the feature request here. -->
diff --git a/fairseq-0.10.2/.github/ISSUE_TEMPLATE/how-to-question.md b/fairseq-0.10.2/.github/ISSUE_TEMPLATE/how-to-question.md
new file mode 100644
index 0000000000000000000000000000000000000000..4beb180dbf6dd61651aabf4a1b0748f2cd834300
--- /dev/null
+++ b/fairseq-0.10.2/.github/ISSUE_TEMPLATE/how-to-question.md
@@ -0,0 +1,33 @@
+---
+name: ❓ Questions/Help
+about: If you have questions, please first search existing issues and docs
+labels: 'question, needs triage'
+---
+
+## ❓ Questions and Help
+
+### Before asking:   
+1. search the issues.   
+2. search the docs.    
+
+<!-- If you still can't find what you need: -->
+
+#### What is your question?
+
+#### Code
+
+<!-- Please paste a code snippet if your question requires it! -->   
+
+#### What have you tried?
+
+#### What's your environment?
+
+ - fairseq Version (e.g., 1.0 or master):
+ - PyTorch Version (e.g., 1.0)
+ - OS (e.g., Linux):
+ - How you installed fairseq (`pip`, source):
+ - Build command you used (if compiling from source):
+ - Python version:
+ - CUDA/cuDNN version:
+ - GPU models and configuration:
+ - Any other relevant information:
diff --git a/fairseq-0.10.2/config/config.yaml b/fairseq-0.10.2/config/config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..66723e706cfe498e1fd04a2b759e092af0dad2f8
--- /dev/null
+++ b/fairseq-0.10.2/config/config.yaml
@@ -0,0 +1,7 @@
+defaults:
+  - params: training_params
+  - task: language_modeling
+  - model: transformer_lm
+  - criterion: cross_entropy
+  - optimizer: adam
+  - lr_scheduler: inverse_sqrt
diff --git a/fairseq-0.10.2/config/config_eval_lm.yaml b/fairseq-0.10.2/config/config_eval_lm.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..5a93cb5d92216c483e5a2172bc7d62c69b165f29
--- /dev/null
+++ b/fairseq-0.10.2/config/config_eval_lm.yaml
@@ -0,0 +1,7 @@
+defaults:
+  - params: eval_lm_params
+  - task: language_modeling
+  - model: transformer_lm
+  - criterion: cross_entropy
+  - optimizer: adam
+  - lr_scheduler: inverse_sqrt
diff --git a/fairseq-0.10.2/config/criterion/adaptive_loss.yaml b/fairseq-0.10.2/config/criterion/adaptive_loss.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a85a7eed1c94cf81021e32e3dd3cf42fb5a525d8
--- /dev/null
+++ b/fairseq-0.10.2/config/criterion/adaptive_loss.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+sentence_avg: ${params.optimization.sentence_avg}
+ddp_backend: ${params.distributed_training.ddp_backend}
diff --git a/fairseq-0.10.2/config/criterion/cross_entropy.yaml b/fairseq-0.10.2/config/criterion/cross_entropy.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a85a7eed1c94cf81021e32e3dd3cf42fb5a525d8
--- /dev/null
+++ b/fairseq-0.10.2/config/criterion/cross_entropy.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+sentence_avg: ${params.optimization.sentence_avg}
+ddp_backend: ${params.distributed_training.ddp_backend}
diff --git a/fairseq-0.10.2/config/lr_scheduler/cosine.yaml b/fairseq-0.10.2/config/lr_scheduler/cosine.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0f91e0d24091ff41458c918821bad3b0103649f9
--- /dev/null
+++ b/fairseq-0.10.2/config/lr_scheduler/cosine.yaml
@@ -0,0 +1,7 @@
+# @package _group_
+warmup_updates: 0
+warmup_init_lr: -1
+max_lr: 1.0
+t_mult: 1.0
+lr_period_updates: -1
+lr_shrink: 0.1
diff --git a/fairseq-0.10.2/config/lr_scheduler/inverse_sqrt.yaml b/fairseq-0.10.2/config/lr_scheduler/inverse_sqrt.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..0eac7d88eb9ac6c5e6da9ab2f108b73b00f2b69e
--- /dev/null
+++ b/fairseq-0.10.2/config/lr_scheduler/inverse_sqrt.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+warmup_updates: 4000
+warmup_init_lr: -1
diff --git a/fairseq-0.10.2/config/model/transformer_lm.yaml b/fairseq-0.10.2/config/model/transformer_lm.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..3837ea54e165ab7b3387f26ee814fb015196ffd9
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.1
+attention_dropout: 0.0
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 512
+decoder_output_dim: 512
+decoder_input_dim: 512
+decoder_ffn_embed_dim: 2048
+decoder_layers: 6
+decoder_attention_heads: 8
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/model/transformer_lm_baevski_gbw.yaml b/fairseq-0.10.2/config/model/transformer_lm_baevski_gbw.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..30b1a4f1e0f5e7f7c2671ff8ec995cc32363f10f
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm_baevski_gbw.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 512
+decoder_output_dim: 512
+decoder_input_dim: 512
+decoder_ffn_embed_dim: 4096
+decoder_layers: 12
+decoder_attention_heads: 16
+decoder_normalize_before: true
+no_decoder_final_norm: true
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/model/transformer_lm_baevski_wiki103.yaml b/fairseq-0.10.2/config/model/transformer_lm_baevski_wiki103.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1154cfa660ee5ce6a272cd1a0049eead1e92c117
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm_baevski_wiki103.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.3
+attention_dropout: 0.1
+activation_dropout: 0.1
+relu_dropout: 0.1
+decoder_embed_dim: 1024
+decoder_output_dim: 1024
+decoder_input_dim: 1024
+decoder_ffn_embed_dim: 4096
+decoder_layers: 16
+decoder_attention_heads: 8
+decoder_normalize_before: true
+no_decoder_final_norm: true
+adaptive_softmax_cutoff: "20000,60000"
+adaptive_softmax_dropout: 0.2
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: true
+adaptive_input_factor: 4
+adaptive_input_cutoff: "20000,60000"
+tie_adaptive_weights: true
+tie_adaptive_proj: true
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/model/transformer_lm_big.yaml b/fairseq-0.10.2/config/model/transformer_lm_big.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..309575310bfc5d9c5cde31563073bef18abc646e
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm_big.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.1
+attention_dropout: 0.0
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 1024
+decoder_output_dim: 1024
+decoder_input_dim: 1024
+decoder_ffn_embed_dim: 4096
+decoder_layers: 12
+decoder_attention_heads: 16
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/model/transformer_lm_gbw.yaml b/fairseq-0.10.2/config/model/transformer_lm_gbw.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..30b1a4f1e0f5e7f7c2671ff8ec995cc32363f10f
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm_gbw.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 512
+decoder_output_dim: 512
+decoder_input_dim: 512
+decoder_ffn_embed_dim: 4096
+decoder_layers: 12
+decoder_attention_heads: 16
+decoder_normalize_before: true
+no_decoder_final_norm: true
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/model/transformer_lm_gpt.yaml b/fairseq-0.10.2/config/model/transformer_lm_gpt.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2c6cb7be3801115371566932ffc78651c9ac6c0f
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm_gpt.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "gelu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 768
+decoder_output_dim: 768
+decoder_input_dim: 768
+decoder_ffn_embed_dim: 3072
+decoder_layers: 12
+decoder_attention_heads: 12
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/model/transformer_lm_gpt2_big.yaml b/fairseq-0.10.2/config/model/transformer_lm_gpt2_big.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a08769a1781abdb13302bf57bf1338bcaf68a0ec
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm_gpt2_big.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "gelu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 1600
+decoder_output_dim: 1600
+decoder_input_dim: 1600
+decoder_ffn_embed_dim: 6400
+decoder_layers: 48
+decoder_attention_heads: 25
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/model/transformer_lm_gpt2_medium.yaml b/fairseq-0.10.2/config/model/transformer_lm_gpt2_medium.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..64261d793c0f1ae091c9bf5c8c77093a07326137
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm_gpt2_medium.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "gelu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 1280
+decoder_output_dim: 1280
+decoder_input_dim: 1280
+decoder_ffn_embed_dim: 5120
+decoder_layers: 36
+decoder_attention_heads: 20
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/model/transformer_lm_gpt2_small.yaml b/fairseq-0.10.2/config/model/transformer_lm_gpt2_small.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..702e81f466c82edf40433589d389edbe0a7b96db
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm_gpt2_small.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "gelu"
+dropout: 0.1
+attention_dropout: 0.1
+activation_dropout: 0.0
+relu_dropout: 0.0
+decoder_embed_dim: 1024
+decoder_output_dim: 1024
+decoder_input_dim: 1024
+decoder_ffn_embed_dim: 4096
+decoder_layers: 24
+decoder_attention_heads: 16
+decoder_normalize_before: true
+no_decoder_final_norm: false
+adaptive_softmax_cutoff: null
+adaptive_softmax_dropout: 0
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: false
+adaptive_input_factor: 4
+adaptive_input_cutoff: null
+tie_adaptive_weights: false
+tie_adaptive_proj: false
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/model/transformer_lm_wiki103.yaml b/fairseq-0.10.2/config/model/transformer_lm_wiki103.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..1154cfa660ee5ce6a272cd1a0049eead1e92c117
--- /dev/null
+++ b/fairseq-0.10.2/config/model/transformer_lm_wiki103.yaml
@@ -0,0 +1,36 @@
+# @package _group_
+activation_fn: "relu"
+dropout: 0.3
+attention_dropout: 0.1
+activation_dropout: 0.1
+relu_dropout: 0.1
+decoder_embed_dim: 1024
+decoder_output_dim: 1024
+decoder_input_dim: 1024
+decoder_ffn_embed_dim: 4096
+decoder_layers: 16
+decoder_attention_heads: 8
+decoder_normalize_before: true
+no_decoder_final_norm: true
+adaptive_softmax_cutoff: "20000,60000"
+adaptive_softmax_dropout: 0.2
+adaptive_softmax_factor: 4
+no_token_positional_embeddings: false
+share_decoder_input_output_embed: false
+character_embeddings: false
+character_filters: "[(1, 64), (2, 128), (3, 192), (4, 256), (5, 256), (6, 256), (7, 256)]"
+character_embedding_dim: 4
+char_embedder_highway_layers: 2
+adaptive_input: true
+adaptive_input_factor: 4
+adaptive_input_cutoff: "20000,60000"
+tie_adaptive_weights: true
+tie_adaptive_proj: true
+decoder_learned_pos: false
+decoder_layerdrop: 0
+decoder_layers_to_keep: null
+layernorm_embedding: false
+no_scale_embedding: false
+quant_noise_pq: 0
+quant_noise_pq_block_size: 8
+quant_noise_scalar: 0
diff --git a/fairseq-0.10.2/config/optimizer/adam.yaml b/fairseq-0.10.2/config/optimizer/adam.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..e5264f895e60901a3c68f3300a4c7a9070eeaeff
--- /dev/null
+++ b/fairseq-0.10.2/config/optimizer/adam.yaml
@@ -0,0 +1,5 @@
+# @package _group_
+adam_betas: "(0.9, 0.999)"
+adam_eps: 1.0e-8
+weight_decay: 0
+use_old_adam: false
diff --git a/fairseq-0.10.2/config/optimizer/nag.yaml b/fairseq-0.10.2/config/optimizer/nag.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..4ab274568658d104bfca51f09046a1f27eb2fd28
--- /dev/null
+++ b/fairseq-0.10.2/config/optimizer/nag.yaml
@@ -0,0 +1,3 @@
+# @package _group_
+momentum: 0.99
+weight_decay: 0.0
diff --git a/fairseq-0.10.2/config/params/eval_lm_params.yaml b/fairseq-0.10.2/config/params/eval_lm_params.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..6f27055d643c055943add764ad79bbeed23e363d
--- /dev/null
+++ b/fairseq-0.10.2/config/params/eval_lm_params.yaml
@@ -0,0 +1,105 @@
+# @package _group_
+common:
+  no_progress_bar: false
+  log_interval: 100
+  log_format: null
+  tensorboard_logdir: null
+  seed: 1
+  cpu: false
+  fp16: false
+  memory_efficient_fp16: false
+  fp16_no_flatten_grads: false
+  fp16_init_scale: 128
+  fp16_scale_window: null
+  fp16_scale_tolerance: 0.0
+  min_loss_scale: 1.0e-4
+  threshold_loss_scale: null
+  user_dir: null
+  empty_cache_freq: 0
+  all_gather_list_size: 16384
+  model_parallel_size: 1
+  checkpoint_suffix: ""
+  quantization_config_path: null
+distributed_training:
+  distributed_rank: 0
+  distributed_backend: "nccl"
+  distributed_init_method: null
+  distributed_port: -1
+  device_id: 0
+  local_rank: 0
+  distributed_no_spawn: false
+  ddp_backend: "c10d"
+  bucket_cap_mb: 25
+  fix_batches_to_gpus: false
+  find_unused_parameters: false
+  fast_stat_sync: false
+  broadcast_buffers: false
+  distributed_wrapper: "DDP"
+  slowmo_momentum: null
+  slowmo_algorithm: "LocalSGD"
+  localsgd_frequency: 3
+dataset:
+  num_workers: 1
+  skip_invalid_size_inputs_valid_test: false
+  max_tokens: null
+  batch_size: ${params.dataset.batch_size}
+  required_batch_size_multiple: 8
+  dataset_impl: null
+  data_buffer_size: 10
+  train_subset: "train"
+  valid_subset: "valid"
+  validate_interval: 1
+  fixed_validation_seed: null
+  disable_validation: false
+  curriculum: 0
+  gen_subset: "test"
+  num_shards: 1
+  shard_id: 0
+  max_tokens_valid: ${params.dataset.max_tokens}
+  batch_size_valid: ${params.dataset.batch_size}
+optimization:
+  max_epoch: 0
+  max_update: 0
+  clip_norm: 25.0
+  sentence_avg: false
+  update_freq: [1]
+  lr: [0.25]
+  min_lr: -1.0
+  use_bmuf: false
+checkpoint:
+  save_dir: "checkpoints"
+  restore_file: "checkpoint_last.pt"
+  reset_dataloader: false
+  reset_lr_scheduler: false
+  reset_meters: false
+  reset_optimizer: false
+  optimizer_overrides: "{}"
+  save_interval: 1
+  save_interval_updates: 0
+  keep_interval_updates: -1
+  keep_last_epochs: -1
+  keep_best_checkpoints: -1
+  no_save: false
+  no_epoch_checkpoints: false
+  no_last_checkpoints: false
+  no_save_optimizer_state: false
+  best_checkpoint_metric: "loss"
+  maximize_best_checkpoint_metric: false
+  patience: -1
+common_eval:
+  path: null
+  remove_bpe: null
+  quiet: false
+  model_overrides: "{}"
+  results_path: null
+eval_lm:
+  output_word_probs: false
+  output_word_stats: false
+  context_window: 0
+bmuf:
+  block_lr: 1
+  block_momentum: 0.875
+  global_sync_iter: 50
+  warmup_iterations: 500
+  use_nbm: false
+  average_sync: false
diff --git a/fairseq-0.10.2/config/params/training_params.yaml b/fairseq-0.10.2/config/params/training_params.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2ce94f929088427db52a40981d117ce5a6d3d8c0
--- /dev/null
+++ b/fairseq-0.10.2/config/params/training_params.yaml
@@ -0,0 +1,95 @@
+# @package _group_
+common:
+  no_progress_bar: false
+  log_interval: 100
+  log_format: null
+  tensorboard_logdir: null
+  seed: 1
+  cpu: false
+  fp16: false
+  memory_efficient_fp16: false
+  fp16_no_flatten_grads: false
+  fp16_init_scale: 128
+  fp16_scale_window: null
+  fp16_scale_tolerance: 0.0
+  min_loss_scale: 1.0e-4
+  threshold_loss_scale: null
+  user_dir: null
+  empty_cache_freq: 0
+  all_gather_list_size: 16384
+  model_parallel_size: 1
+  checkpoint_suffix: ""
+  quantization_config_path: null
+distributed_training:
+  distributed_rank: 0
+  distributed_backend: "nccl"
+  distributed_init_method: null
+  distributed_port: -1
+  device_id: 0
+  local_rank: 0
+  distributed_no_spawn: false
+  ddp_backend: "c10d"
+  bucket_cap_mb: 25
+  fix_batches_to_gpus: false
+  find_unused_parameters: false
+  fast_stat_sync: false
+  broadcast_buffers: false
+  distributed_wrapper: "DDP"
+  slowmo_momentum: null
+  slowmo_algorithm: "LocalSGD"
+  localsgd_frequency: 3
+dataset:
+  num_workers: 1
+  skip_invalid_size_inputs_valid_test: false
+  max_tokens: null
+  batch_size: ${params.dataset.batch_size}
+  required_batch_size_multiple: 8
+  dataset_impl: null
+  data_buffer_size: 10
+  train_subset: "train"
+  valid_subset: "valid"
+  validate_interval: 1
+  fixed_validation_seed: null
+  disable_validation: false
+  curriculum: 0
+  gen_subset: "test"
+  num_shards: 1
+  shard_id: 0
+  max_tokens_valid: ${params.dataset.max_tokens}
+  batch_size_valid: ${params.dataset.batch_size}
+optimization:
+  max_epoch: 0
+  max_update: 0
+  clip_norm: 25.0
+  sentence_avg: false
+  update_freq: [1]
+  lr: [0.25]
+  min_lr: -1.0
+  use_bmuf: false
+checkpoint:
+  save_dir: "checkpoints"
+  restore_file: "checkpoint_last.pt"
+  reset_dataloader: false
+  reset_lr_scheduler: false
+  reset_meters: false
+  reset_optimizer: false
+  optimizer_overrides: "{}"
+  save_interval: 1
+  save_interval_updates: 0
+  keep_interval_updates: -1
+  keep_last_epochs: -1
+  keep_best_checkpoints: -1
+  no_save: false
+  no_epoch_checkpoints: false
+  no_last_checkpoints: false
+  no_save_optimizer_state: false
+  best_checkpoint_metric: "loss"
+  maximize_best_checkpoint_metric: false
+  patience: -1
+bmuf:
+  block_lr: 1
+  block_momentum: 0.875
+  global_sync_iter: 50
+  warmup_iterations: 500
+  use_nbm: false
+  average_sync: false
diff --git a/fairseq-0.10.2/config/task/language_modeling.yaml b/fairseq-0.10.2/config/task/language_modeling.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..58a2ad1358e705b3fbc7f85e0520062837cf5f96
--- /dev/null
+++ b/fairseq-0.10.2/config/task/language_modeling.yaml
@@ -0,0 +1,10 @@
+# @package _group_
+data: ???
+sample_break_mode: "none"
+tokens_per_sample: 1024
+output_dictionary_size: -1
+self_target: false
+future_target: false
+past_target: false
+add_bos_token: false
+max_target_positions: null
diff --git a/fairseq-0.10.2/examples/noisychannel/README.md b/fairseq-0.10.2/examples/noisychannel/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..9d101aa874ec36ff3bb5c1166169a4c4f38ffe2b
--- /dev/null
+++ b/fairseq-0.10.2/examples/noisychannel/README.md
@@ -0,0 +1,72 @@
+# Simple and Effective Noisy Channel Modeling for Neural Machine Translation (Yee et al., 2019)
+This page contains pointers to pre-trained models as well as instructions on how to run the reranking scripts.
+
+## Citation:
+```bibtex
+@inproceedings{yee2019simple,
+  title = {Simple and Effective Noisy Channel Modeling for Neural Machine Translation},
+  author = {Kyra Yee and Yann Dauphin and Michael Auli},
+  booktitle = {Conference on Empirical Methods in Natural Language Processing},
+  year = {2019},
+}
+```
+
+## Pre-trained Models:
+
+Model | Description |  Download
+---|---|---
+`transformer.noisychannel.de-en` | De->En Forward Model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/forward_de2en.tar.bz2)
+`transformer.noisychannel.en-de` | En->De Channel Model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/backward_en2de.tar.bz2)
+`transformer_lm.noisychannel.en` | En Language model | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/reranking_en_lm.tar.bz2)
+
+Test Data: [newstest_wmt17](https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/wmt17test.tar.bz2)
+
+## Example usage
+
+```
+mkdir rerank_example
+curl https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/forward_de2en.tar.bz2 | tar xvjf - -C rerank_example
+curl https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/backward_en2de.tar.bz2 | tar xvjf - -C rerank_example
+curl https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/reranking_en_lm.tar.bz2 | tar xvjf - -C rerank_example
+curl https://dl.fbaipublicfiles.com/fairseq/models/noisychannel/wmt17test.tar.bz2 | tar xvjf - -C rerank_example
+
+beam=50
+num_trials=1000
+fw_name=fw_model_ex
+bw_name=bw_model_ex
+lm_name=lm_ex
+data_dir=rerank_example/hyphen-splitting-mixed-case-wmt17test-wmt14bpe
+data_dir_name=wmt17
+lm=rerank_example/lm/checkpoint_best.pt
+lm_bpe_code=rerank_example/lm/bpe32k.code
+lm_dict=rerank_example/lm/dict.txt
+batch_size=32
+bw=rerank_example/backward_en2de.pt
+fw=rerank_example/forward_de2en.pt
+
+# reranking with P(T|S) P(S|T) and P(T)
+python examples/noisychannel/rerank_tune.py $data_dir  --tune-param lenpen weight1 weight3  \
+    --lower-bound 0 0 0 --upper-bound 3 3 3 --data-dir-name $data_dir_name  \ 
+    --num-trials $num_trials  --source-lang de --target-lang en --gen-model $fw \
+    -n $beam --batch-size $batch_size --score-model2 $fw --score-model1 $bw \
+    --backwards1 --weight2 1 \
+    -lm $lm  --lm-dict $lm_dict  --lm-name en_newscrawl --lm-bpe-code $lm_bpe_code \
+    --model2-name $fw_name --model1-name $bw_name --gen-model-name $fw_name
+
+# reranking with P(T|S) and P(T)
+python examples/noisychannel/rerank_tune.py $data_dir  --tune-param lenpen weight3 \
+    --lower-bound 0 0 --upper-bound 3 3  --data-dir-name $data_dir_name  \
+    --num-trials $num_trials  --source-lang de --target-lang en --gen-model $fw \
+    -n $beam --batch-size $batch_size --score-model1 $fw \
+    -lm $lm  --lm-dict $lm_dict  --lm-name en_newscrawl --lm-bpe-code $lm_bpe_code \
+    --model1-name $fw_name --gen-model-name $fw_name
+
+# to run with a preconfigured set of hyperparameters for the lenpen and model weights, using rerank.py instead.
+python examples/noisychannel/rerank.py $data_dir \
+    --lenpen 0.269 --weight1 1 --weight2 0.929 --weight3 0.831  \
+    --data-dir-name $data_dir_name  --source-lang de --target-lang en --gen-model $fw \
+    -n $beam --batch-size $batch_size --score-model2 $fw --score-model1 $bw --backwards1  \
+    -lm $lm  --lm-dict $lm_dict  --lm-name en_newscrawl --lm-bpe-code $lm_bpe_code \
+    --model2-name $fw_name --model1-name $bw_name --gen-model-name $fw_name
+```
+
diff --git a/fairseq-0.10.2/examples/noisychannel/__init__.py b/fairseq-0.10.2/examples/noisychannel/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..89f1aef4f6328d25425e0bcabb42dfffd2ed35f0
--- /dev/null
+++ b/fairseq-0.10.2/examples/noisychannel/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from .rerank_options import *  # noqa
diff --git a/fairseq-0.10.2/examples/noisychannel/rerank.py b/fairseq-0.10.2/examples/noisychannel/rerank.py
new file mode 100644
index 0000000000000000000000000000000000000000..4df424e6b5d8f517c210af839b1c2ec7c46fa3f8
--- /dev/null
+++ b/fairseq-0.10.2/examples/noisychannel/rerank.py
@@ -0,0 +1,422 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+from multiprocessing import Pool
+
+import numpy as np
+from fairseq import options
+from fairseq.data import dictionary
+from fairseq.scoring import bleu
+
+from . import (
+    rerank_generate,
+    rerank_options,
+    rerank_score_bw,
+    rerank_score_lm,
+    rerank_utils,
+)
+
+
+def score_target_hypo(
+    args, a, b, c, lenpen, target_outfile, hypo_outfile, write_hypos, normalize
+):
+
+    print("lenpen", lenpen, "weight1", a, "weight2", b, "weight3", c)
+    gen_output_lst, bitext1_lst, bitext2_lst, lm_res_lst = load_score_files(args)
+    dict = dictionary.Dictionary()
+    scorer = bleu.Scorer(dict.pad(), dict.eos(), dict.unk())
+
+    ordered_hypos = {}
+    ordered_targets = {}
+
+    for shard_id in range(len(bitext1_lst)):
+        bitext1 = bitext1_lst[shard_id]
+        bitext2 = bitext2_lst[shard_id]
+        gen_output = gen_output_lst[shard_id]
+        lm_res = lm_res_lst[shard_id]
+
+        total = len(bitext1.rescore_source.keys())
+        source_lst = []
+        hypo_lst = []
+        score_lst = []
+        reference_lst = []
+        j = 1
+        best_score = -math.inf
+
+        for i in range(total):
+            # length is measured in terms of words, not bpe tokens, since models may not share the same bpe
+            target_len = len(bitext1.rescore_hypo[i].split())
+
+            if lm_res is not None:
+                lm_score = lm_res.score[i]
+            else:
+                lm_score = 0
+
+            if bitext2 is not None:
+                bitext2_score = bitext2.rescore_score[i]
+                bitext2_backwards = bitext2.backwards
+            else:
+                bitext2_score = None
+                bitext2_backwards = None
+
+            score = rerank_utils.get_score(
+                a,
+                b,
+                c,
+                target_len,
+                bitext1.rescore_score[i],
+                bitext2_score,
+                lm_score=lm_score,
+                lenpen=lenpen,
+                src_len=bitext1.source_lengths[i],
+                tgt_len=bitext1.target_lengths[i],
+                bitext1_backwards=bitext1.backwards,
+                bitext2_backwards=bitext2_backwards,
+                normalize=normalize,
+            )
+
+            if score > best_score:
+                best_score = score
+                best_hypo = bitext1.rescore_hypo[i]
+
+            if j == gen_output.num_hypos[i] or j == args.num_rescore:
+                j = 1
+                hypo_lst.append(best_hypo)
+                score_lst.append(best_score)
+                source_lst.append(bitext1.rescore_source[i])
+                reference_lst.append(bitext1.rescore_target[i])
+
+                best_score = -math.inf
+                best_hypo = ""
+            else:
+                j += 1
+
+        gen_keys = list(sorted(gen_output.no_bpe_target.keys()))
+
+        for key in range(len(gen_keys)):
+            if args.prefix_len is None:
+                assert hypo_lst[key] in gen_output.no_bpe_hypo[gen_keys[key]], (
+                    "pred and rescore hypo mismatch: i: "
+                    + str(key)
+                    + ", "
+                    + str(hypo_lst[key])
+                    + str(gen_keys[key])
+                    + str(gen_output.no_bpe_hypo[key])
+                )
+                sys_tok = dict.encode_line(hypo_lst[key])
+                ref_tok = dict.encode_line(gen_output.no_bpe_target[gen_keys[key]])
+                scorer.add(ref_tok, sys_tok)
+
+            else:
+                full_hypo = rerank_utils.get_full_from_prefix(
+                    hypo_lst[key], gen_output.no_bpe_hypo[gen_keys[key]]
+                )
+                sys_tok = dict.encode_line(full_hypo)
+                ref_tok = dict.encode_line(gen_output.no_bpe_target[gen_keys[key]])
+                scorer.add(ref_tok, sys_tok)
+
+        # if only one set of hyper parameters is provided, write the predictions to a file
+        if write_hypos:
+            # recover the orinal ids from n best list generation
+            for key in range(len(gen_output.no_bpe_target)):
+                if args.prefix_len is None:
+                    assert hypo_lst[key] in gen_output.no_bpe_hypo[gen_keys[key]], (
+                        "pred and rescore hypo mismatch:"
+                        + "i:"
+                        + str(key)
+                        + str(hypo_lst[key])
+                        + str(gen_output.no_bpe_hypo[key])
+                    )
+                    ordered_hypos[gen_keys[key]] = hypo_lst[key]
+                    ordered_targets[gen_keys[key]] = gen_output.no_bpe_target[
+                        gen_keys[key]
+                    ]
+
+                else:
+                    full_hypo = rerank_utils.get_full_from_prefix(
+                        hypo_lst[key], gen_output.no_bpe_hypo[gen_keys[key]]
+                    )
+                    ordered_hypos[gen_keys[key]] = full_hypo
+                    ordered_targets[gen_keys[key]] = gen_output.no_bpe_target[
+                        gen_keys[key]
+                    ]
+
+    # write the hypos in the original order from nbest list generation
+    if args.num_shards == (len(bitext1_lst)):
+        with open(target_outfile, "w") as t:
+            with open(hypo_outfile, "w") as h:
+                for key in range(len(ordered_hypos)):
+                    t.write(ordered_targets[key])
+                    h.write(ordered_hypos[key])
+
+    res = scorer.result_string(4)
+    if write_hypos:
+        print(res)
+    score = rerank_utils.parse_bleu_scoring(res)
+    return score
+
+
+def match_target_hypo(args, target_outfile, hypo_outfile):
+    """combine scores from the LM and bitext models, and write the top scoring hypothesis to a file"""
+    if len(args.weight1) == 1:
+        res = score_target_hypo(
+            args,
+            args.weight1[0],
+            args.weight2[0],
+            args.weight3[0],
+            args.lenpen[0],
+            target_outfile,
+            hypo_outfile,
+            True,
+            args.normalize,
+        )
+        rerank_scores = [res]
+    else:
+        print("launching pool")
+        with Pool(32) as p:
+            rerank_scores = p.starmap(
+                score_target_hypo,
+                [
+                    (
+                        args,
+                        args.weight1[i],
+                        args.weight2[i],
+                        args.weight3[i],
+                        args.lenpen[i],
+                        target_outfile,
+                        hypo_outfile,
+                        False,
+                        args.normalize,
+                    )
+                    for i in range(len(args.weight1))
+                ],
+            )
+
+    if len(rerank_scores) > 1:
+        best_index = np.argmax(rerank_scores)
+        best_score = rerank_scores[best_index]
+        print("best score", best_score)
+        print("best lenpen", args.lenpen[best_index])
+        print("best weight1", args.weight1[best_index])
+        print("best weight2", args.weight2[best_index])
+        print("best weight3", args.weight3[best_index])
+        return (
+            args.lenpen[best_index],
+            args.weight1[best_index],
+            args.weight2[best_index],
+            args.weight3[best_index],
+            best_score,
+        )
+
+    else:
+        return (
+            args.lenpen[0],
+            args.weight1[0],
+            args.weight2[0],
+            args.weight3[0],
+            rerank_scores[0],
+        )
+
+
+def load_score_files(args):
+    if args.all_shards:
+        shard_ids = list(range(args.num_shards))
+    else:
+        shard_ids = [args.shard_id]
+
+    gen_output_lst = []
+    bitext1_lst = []
+    bitext2_lst = []
+    lm_res1_lst = []
+
+    for shard_id in shard_ids:
+        using_nbest = args.nbest_list is not None
+        (
+            pre_gen,
+            left_to_right_preprocessed_dir,
+            right_to_left_preprocessed_dir,
+            backwards_preprocessed_dir,
+            lm_preprocessed_dir,
+        ) = rerank_utils.get_directories(
+            args.data_dir_name,
+            args.num_rescore,
+            args.gen_subset,
+            args.gen_model_name,
+            shard_id,
+            args.num_shards,
+            args.sampling,
+            args.prefix_len,
+            args.target_prefix_frac,
+            args.source_prefix_frac,
+        )
+
+        rerank1_is_gen = (
+            args.gen_model == args.score_model1 and args.source_prefix_frac is None
+        )
+        rerank2_is_gen = (
+            args.gen_model == args.score_model2 and args.source_prefix_frac is None
+        )
+
+        score1_file = rerank_utils.rescore_file_name(
+            pre_gen,
+            args.prefix_len,
+            args.model1_name,
+            target_prefix_frac=args.target_prefix_frac,
+            source_prefix_frac=args.source_prefix_frac,
+            backwards=args.backwards1,
+        )
+        if args.score_model2 is not None:
+            score2_file = rerank_utils.rescore_file_name(
+                pre_gen,
+                args.prefix_len,
+                args.model2_name,
+                target_prefix_frac=args.target_prefix_frac,
+                source_prefix_frac=args.source_prefix_frac,
+                backwards=args.backwards2,
+            )
+        if args.language_model is not None:
+            lm_score_file = rerank_utils.rescore_file_name(
+                pre_gen, args.prefix_len, args.lm_name, lm_file=True
+            )
+
+        # get gen output
+        predictions_bpe_file = pre_gen + "/generate_output_bpe.txt"
+        if using_nbest:
+            print("Using predefined n-best list from interactive.py")
+            predictions_bpe_file = args.nbest_list
+        gen_output = rerank_utils.BitextOutputFromGen(
+            predictions_bpe_file,
+            bpe_symbol=args.remove_bpe,
+            nbest=using_nbest,
+            prefix_len=args.prefix_len,
+            target_prefix_frac=args.target_prefix_frac,
+        )
+
+        if rerank1_is_gen:
+            bitext1 = gen_output
+        else:
+            bitext1 = rerank_utils.BitextOutput(
+                score1_file,
+                args.backwards1,
+                args.right_to_left1,
+                args.remove_bpe,
+                args.prefix_len,
+                args.target_prefix_frac,
+                args.source_prefix_frac,
+            )
+
+        if args.score_model2 is not None or args.nbest_list is not None:
+            if rerank2_is_gen:
+                bitext2 = gen_output
+            else:
+                bitext2 = rerank_utils.BitextOutput(
+                    score2_file,
+                    args.backwards2,
+                    args.right_to_left2,
+                    args.remove_bpe,
+                    args.prefix_len,
+                    args.target_prefix_frac,
+                    args.source_prefix_frac,
+                )
+
+                assert (
+                    bitext2.source_lengths == bitext1.source_lengths
+                ), "source lengths for rescoring models do not match"
+                assert (
+                    bitext2.target_lengths == bitext1.target_lengths
+                ), "target lengths for rescoring models do not match"
+        else:
+            if args.diff_bpe:
+                assert args.score_model2 is None
+                bitext2 = gen_output
+            else:
+                bitext2 = None
+
+        if args.language_model is not None:
+            lm_res1 = rerank_utils.LMOutput(
+                lm_score_file,
+                args.lm_dict,
+                args.prefix_len,
+                args.remove_bpe,
+                args.target_prefix_frac,
+            )
+        else:
+            lm_res1 = None
+
+        gen_output_lst.append(gen_output)
+        bitext1_lst.append(bitext1)
+        bitext2_lst.append(bitext2)
+        lm_res1_lst.append(lm_res1)
+    return gen_output_lst, bitext1_lst, bitext2_lst, lm_res1_lst
+
+
+def rerank(args):
+    if type(args.lenpen) is not list:
+        args.lenpen = [args.lenpen]
+    if type(args.weight1) is not list:
+        args.weight1 = [args.weight1]
+    if type(args.weight2) is not list:
+        args.weight2 = [args.weight2]
+    if type(args.weight3) is not list:
+        args.weight3 = [args.weight3]
+    if args.all_shards:
+        shard_ids = list(range(args.num_shards))
+    else:
+        shard_ids = [args.shard_id]
+
+    for shard_id in shard_ids:
+        (
+            pre_gen,
+            left_to_right_preprocessed_dir,
+            right_to_left_preprocessed_dir,
+            backwards_preprocessed_dir,
+            lm_preprocessed_dir,
+        ) = rerank_utils.get_directories(
+            args.data_dir_name,
+            args.num_rescore,
+            args.gen_subset,
+            args.gen_model_name,
+            shard_id,
+            args.num_shards,
+            args.sampling,
+            args.prefix_len,
+            args.target_prefix_frac,
+            args.source_prefix_frac,
+        )
+        rerank_generate.gen_and_reprocess_nbest(args)
+        rerank_score_bw.score_bw(args)
+        rerank_score_lm.score_lm(args)
+
+        if args.write_hypos is None:
+            write_targets = pre_gen + "/matched_targets"
+            write_hypos = pre_gen + "/matched_hypos"
+        else:
+            write_targets = args.write_hypos + "_targets" + args.gen_subset
+            write_hypos = args.write_hypos + "_hypos" + args.gen_subset
+
+    if args.all_shards:
+        write_targets += "_all_shards"
+        write_hypos += "_all_shards"
+
+    (
+        best_lenpen,
+        best_weight1,
+        best_weight2,
+        best_weight3,
+        best_score,
+    ) = match_target_hypo(args, write_targets, write_hypos)
+
+    return best_lenpen, best_weight1, best_weight2, best_weight3, best_score
+
+
+def cli_main():
+    parser = rerank_options.get_reranking_parser()
+    args = options.parse_args_and_arch(parser)
+    rerank(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/fairseq-0.10.2/examples/noisychannel/rerank_generate.py b/fairseq-0.10.2/examples/noisychannel/rerank_generate.py
new file mode 100644
index 0000000000000000000000000000000000000000..4356b3387ed585273d5f9f49a717c61c1e4849d3
--- /dev/null
+++ b/fairseq-0.10.2/examples/noisychannel/rerank_generate.py
@@ -0,0 +1,397 @@
+#!/usr/bin/env python3 -u
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""
+Generate n-best translations using a trained model.
+"""
+
+import os
+import subprocess
+from contextlib import redirect_stdout
+
+from fairseq import options
+from fairseq_cli import generate, preprocess
+
+from . import rerank_options, rerank_utils
+
+
+def gen_and_reprocess_nbest(args):
+    if args.score_dict_dir is None:
+        args.score_dict_dir = args.data
+    if args.prefix_len is not None:
+        assert (
+            args.right_to_left1 is False
+        ), "prefix length not compatible with right to left models"
+        assert (
+            args.right_to_left2 is False
+        ), "prefix length not compatible with right to left models"
+
+    if args.nbest_list is not None:
+        assert args.score_model2 is None
+
+    if args.backwards1:
+        scorer1_src = args.target_lang
+        scorer1_tgt = args.source_lang
+    else:
+        scorer1_src = args.source_lang
+        scorer1_tgt = args.target_lang
+
+    store_data = (
+        os.path.join(os.path.dirname(__file__)) + "/rerank_data/" + args.data_dir_name
+    )
+    if not os.path.exists(store_data):
+        os.makedirs(store_data)
+
+    (
+        pre_gen,
+        left_to_right_preprocessed_dir,
+        right_to_left_preprocessed_dir,
+        backwards_preprocessed_dir,
+        lm_preprocessed_dir,
+    ) = rerank_utils.get_directories(
+        args.data_dir_name,
+        args.num_rescore,
+        args.gen_subset,
+        args.gen_model_name,
+        args.shard_id,
+        args.num_shards,
+        args.sampling,
+        args.prefix_len,
+        args.target_prefix_frac,
+        args.source_prefix_frac,
+    )
+    assert not (
+        args.right_to_left1 and args.backwards1
+    ), "backwards right to left not supported"
+    assert not (
+        args.right_to_left2 and args.backwards2
+    ), "backwards right to left not supported"
+    assert not (
+        args.prefix_len is not None and args.target_prefix_frac is not None
+    ), "target prefix frac and target prefix len incompatible"
+
+    # make directory to store generation results
+    if not os.path.exists(pre_gen):
+        os.makedirs(pre_gen)
+
+    rerank1_is_gen = (
+        args.gen_model == args.score_model1 and args.source_prefix_frac is None
+    )
+    rerank2_is_gen = (
+        args.gen_model == args.score_model2 and args.source_prefix_frac is None
+    )
+
+    if args.nbest_list is not None:
+        rerank2_is_gen = True
+
+    # make directories to store preprossed nbest list for reranking
+    if not os.path.exists(left_to_right_preprocessed_dir):
+        os.makedirs(left_to_right_preprocessed_dir)
+    if not os.path.exists(right_to_left_preprocessed_dir):
+        os.makedirs(right_to_left_preprocessed_dir)
+    if not os.path.exists(lm_preprocessed_dir):
+        os.makedirs(lm_preprocessed_dir)
+    if not os.path.exists(backwards_preprocessed_dir):
+        os.makedirs(backwards_preprocessed_dir)
+
+    score1_file = rerank_utils.rescore_file_name(
+        pre_gen,
+        args.prefix_len,
+        args.model1_name,
+        target_prefix_frac=args.target_prefix_frac,
+        source_prefix_frac=args.source_prefix_frac,
+        backwards=args.backwards1,
+    )
+    if args.score_model2 is not None:
+        score2_file = rerank_utils.rescore_file_name(
+            pre_gen,
+            args.prefix_len,
+            args.model2_name,
+            target_prefix_frac=args.target_prefix_frac,
+            source_prefix_frac=args.source_prefix_frac,
+            backwards=args.backwards2,
+        )
+
+    predictions_bpe_file = pre_gen + "/generate_output_bpe.txt"
+
+    using_nbest = args.nbest_list is not None
+
+    if using_nbest:
+        print("Using predefined n-best list from interactive.py")
+        predictions_bpe_file = args.nbest_list
+
+    else:
+        if not os.path.isfile(predictions_bpe_file):
+            print("STEP 1: generate predictions using the p(T|S) model with bpe")
+            print(args.data)
+            param1 = [
+                args.data,
+                "--path",
+                args.gen_model,
+                "--shard-id",
+                str(args.shard_id),
+                "--num-shards",
+                str(args.num_shards),
+                "--nbest",
+                str(args.num_rescore),
+                "--batch-size",
+                str(args.batch_size),
+                "--beam",
+                str(args.num_rescore),
+                "--batch-size",
+                str(args.num_rescore),
+                "--gen-subset",
+                args.gen_subset,
+                "--source-lang",
+                args.source_lang,
+                "--target-lang",
+                args.target_lang,
+            ]
+            if args.sampling:
+                param1 += ["--sampling"]
+
+            gen_parser = options.get_generation_parser()
+            input_args = options.parse_args_and_arch(gen_parser, param1)
+
+            print(input_args)
+            with open(predictions_bpe_file, "w") as f:
+                with redirect_stdout(f):
+                    generate.main(input_args)
+
+    gen_output = rerank_utils.BitextOutputFromGen(
+        predictions_bpe_file,
+        bpe_symbol=args.remove_bpe,
+        nbest=using_nbest,
+        prefix_len=args.prefix_len,
+        target_prefix_frac=args.target_prefix_frac,
+    )
+
+    if args.diff_bpe:
+        rerank_utils.write_reprocessed(
+            gen_output.no_bpe_source,
+            gen_output.no_bpe_hypo,
+            gen_output.no_bpe_target,
+            pre_gen + "/source_gen_bpe." + args.source_lang,
+            pre_gen + "/target_gen_bpe." + args.target_lang,
+            pre_gen + "/reference_gen_bpe." + args.target_lang,
+        )
+        bitext_bpe = args.rescore_bpe_code
+        bpe_src_param = [
+            "-c",
+            bitext_bpe,
+            "--input",
+            pre_gen + "/source_gen_bpe." + args.source_lang,
+            "--output",
+            pre_gen + "/rescore_data." + args.source_lang,
+        ]
+        bpe_tgt_param = [
+            "-c",
+            bitext_bpe,
+            "--input",
+            pre_gen + "/target_gen_bpe." + args.target_lang,
+            "--output",
+            pre_gen + "/rescore_data." + args.target_lang,
+        ]
+
+        subprocess.call(
+            [
+                "python",
+                os.path.join(
+                    os.path.dirname(__file__), "subword-nmt/subword_nmt/apply_bpe.py"
+                ),
+            ]
+            + bpe_src_param,
+            shell=False,
+        )
+
+        subprocess.call(
+            [
+                "python",
+                os.path.join(
+                    os.path.dirname(__file__), "subword-nmt/subword_nmt/apply_bpe.py"
+                ),
+            ]
+            + bpe_tgt_param,
+            shell=False,
+        )
+
+    if (not os.path.isfile(score1_file) and not rerank1_is_gen) or (
+        args.score_model2 is not None
+        and not os.path.isfile(score2_file)
+        and not rerank2_is_gen
+    ):
+        print(
+            "STEP 2: process the output of generate.py so we have clean text files with the translations"
+        )
+
+        rescore_file = "/rescore_data"
+        if args.prefix_len is not None:
+            prefix_len_rescore_file = rescore_file + "prefix" + str(args.prefix_len)
+        if args.target_prefix_frac is not None:
+            target_prefix_frac_rescore_file = (
+                rescore_file + "target_prefix_frac" + str(args.target_prefix_frac)
+            )
+        if args.source_prefix_frac is not None:
+            source_prefix_frac_rescore_file = (
+                rescore_file + "source_prefix_frac" + str(args.source_prefix_frac)
+            )
+
+        if not args.right_to_left1 or not args.right_to_left2:
+            if not args.diff_bpe:
+                rerank_utils.write_reprocessed(
+                    gen_output.source,
+                    gen_output.hypo,
+                    gen_output.target,
+                    pre_gen + rescore_file + "." + args.source_lang,
+                    pre_gen + rescore_file + "." + args.target_lang,
+                    pre_gen + "/reference_file",
+                    bpe_symbol=args.remove_bpe,
+                )
+                if args.prefix_len is not None:
+                    bw_rescore_file = prefix_len_rescore_file
+                    rerank_utils.write_reprocessed(
+                        gen_output.source,
+                        gen_output.hypo,
+                        gen_output.target,
+                        pre_gen + prefix_len_rescore_file + "." + args.source_lang,
+                        pre_gen + prefix_len_rescore_file + "." + args.target_lang,
+                        pre_gen + "/reference_file",
+                        prefix_len=args.prefix_len,
+                        bpe_symbol=args.remove_bpe,
+                    )
+                elif args.target_prefix_frac is not None:
+                    bw_rescore_file = target_prefix_frac_rescore_file
+                    rerank_utils.write_reprocessed(
+                        gen_output.source,
+                        gen_output.hypo,
+                        gen_output.target,
+                        pre_gen
+                        + target_prefix_frac_rescore_file
+                        + "."
+                        + args.source_lang,
+                        pre_gen
+                        + target_prefix_frac_rescore_file
+                        + "."
+                        + args.target_lang,
+                        pre_gen + "/reference_file",
+                        bpe_symbol=args.remove_bpe,
+                        target_prefix_frac=args.target_prefix_frac,
+                    )
+                else:
+                    bw_rescore_file = rescore_file
+
+                if args.source_prefix_frac is not None:
+                    fw_rescore_file = source_prefix_frac_rescore_file
+                    rerank_utils.write_reprocessed(
+                        gen_output.source,
+                        gen_output.hypo,
+                        gen_output.target,
+                        pre_gen
+                        + source_prefix_frac_rescore_file
+                        + "."
+                        + args.source_lang,
+                        pre_gen
+                        + source_prefix_frac_rescore_file
+                        + "."
+                        + args.target_lang,
+                        pre_gen + "/reference_file",
+                        bpe_symbol=args.remove_bpe,
+                        source_prefix_frac=args.source_prefix_frac,
+                    )
+                else:
+                    fw_rescore_file = rescore_file
+
+        if args.right_to_left1 or args.right_to_left2:
+            rerank_utils.write_reprocessed(
+                gen_output.source,
+                gen_output.hypo,
+                gen_output.target,
+                pre_gen + "/right_to_left_rescore_data." + args.source_lang,
+                pre_gen + "/right_to_left_rescore_data." + args.target_lang,
+                pre_gen + "/right_to_left_reference_file",
+                right_to_left=True,
+                bpe_symbol=args.remove_bpe,
+            )
+
+        print("STEP 3: binarize the translations")
+        if (
+            not args.right_to_left1
+            or args.score_model2 is not None
+            and not args.right_to_left2
+            or not rerank1_is_gen
+        ):
+
+            if args.backwards1 or args.backwards2:
+                if args.backwards_score_dict_dir is not None:
+                    bw_dict = args.backwards_score_dict_dir
+                else:
+                    bw_dict = args.score_dict_dir
+                bw_preprocess_param = [
+                    "--source-lang",
+                    scorer1_src,
+                    "--target-lang",
+                    scorer1_tgt,
+                    "--trainpref",
+                    pre_gen + bw_rescore_file,
+                    "--srcdict",
+                    bw_dict + "/dict." + scorer1_src + ".txt",
+                    "--tgtdict",
+                    bw_dict + "/dict." + scorer1_tgt + ".txt",
+                    "--destdir",
+                    backwards_preprocessed_dir,
+                ]
+                preprocess_parser = options.get_preprocessing_parser()
+                input_args = preprocess_parser.parse_args(bw_preprocess_param)
+                preprocess.main(input_args)
+
+            preprocess_param = [
+                "--source-lang",
+                scorer1_src,
+                "--target-lang",
+                scorer1_tgt,
+                "--trainpref",
+                pre_gen + fw_rescore_file,
+                "--srcdict",
+                args.score_dict_dir + "/dict." + scorer1_src + ".txt",
+                "--tgtdict",
+                args.score_dict_dir + "/dict." + scorer1_tgt + ".txt",
+                "--destdir",
+                left_to_right_preprocessed_dir,
+            ]
+            preprocess_parser = options.get_preprocessing_parser()
+            input_args = preprocess_parser.parse_args(preprocess_param)
+            preprocess.main(input_args)
+
+        if args.right_to_left1 or args.right_to_left2:
+            preprocess_param = [
+                "--source-lang",
+                scorer1_src,
+                "--target-lang",
+                scorer1_tgt,
+                "--trainpref",
+                pre_gen + "/right_to_left_rescore_data",
+                "--srcdict",
+                args.score_dict_dir + "/dict." + scorer1_src + ".txt",
+                "--tgtdict",
+                args.score_dict_dir + "/dict." + scorer1_tgt + ".txt",
+                "--destdir",
+                right_to_left_preprocessed_dir,
+            ]
+            preprocess_parser = options.get_preprocessing_parser()
+            input_args = preprocess_parser.parse_args(preprocess_param)
+            preprocess.main(input_args)
+
+    return gen_output
+
+
+def cli_main():
+    parser = rerank_options.get_reranking_parser()
+    args = options.parse_args_and_arch(parser)
+    gen_and_reprocess_nbest(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/fairseq-0.10.2/examples/noisychannel/rerank_options.py b/fairseq-0.10.2/examples/noisychannel/rerank_options.py
new file mode 100644
index 0000000000000000000000000000000000000000..ca7a2e0a614d397f2a28b40fa365d409851641d4
--- /dev/null
+++ b/fairseq-0.10.2/examples/noisychannel/rerank_options.py
@@ -0,0 +1,149 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq import options
+
+
+def get_reranking_parser(default_task="translation"):
+    parser = options.get_parser("Generation and reranking", default_task)
+    add_reranking_args(parser)
+    return parser
+
+
+def get_tuning_parser(default_task="translation"):
+    parser = options.get_parser("Reranking tuning", default_task)
+    add_reranking_args(parser)
+    add_tuning_args(parser)
+    return parser
+
+
+def add_reranking_args(parser):
+    group = parser.add_argument_group("Reranking")
+    # fmt: off
+    group.add_argument('--score-model1', '-s1', type=str, metavar='FILE', required=True,
+                       help='path to first model or ensemble of models for rescoring')
+    group.add_argument('--score-model2', '-s2', type=str, metavar='FILE', required=False,
+                       help='path to second model or ensemble of models for rescoring')
+    group.add_argument('--num-rescore', '-n', type=int, metavar='N', default=10,
+                       help='the number of candidate hypothesis to rescore')
+    group.add_argument('-bz', '--batch-size', type=int, metavar='N', default=128,
+                       help='batch size for generating the nbest list')
+    group.add_argument('--gen-subset', default='test', metavar='SET', choices=['test', 'train', 'valid'],
+                       help='data subset to generate (train, valid, test)')
+    group.add_argument('--gen-model', default=None, metavar='FILE',
+                       help='the model to generate translations')
+    group.add_argument('-b1', '--backwards1', action='store_true',
+                       help='whether or not the first model group is backwards')
+    group.add_argument('-b2', '--backwards2', action='store_true',
+                       help='whether or not the second model group is backwards')
+    group.add_argument('-a', '--weight1', default=1, nargs='+', type=float,
+                       help='the weight(s) of the first model')
+    group.add_argument('-b', '--weight2', default=1, nargs='+', type=float,
+                       help='the weight(s) of the second model, or the gen model if using nbest from interactive.py')
+    group.add_argument('-c', '--weight3', default=1, nargs='+', type=float,
+                       help='the weight(s) of the third model')
+
+    # lm arguments
+    group.add_argument('-lm', '--language-model', default=None, metavar='FILE',
+                       help='language model for target language to rescore translations')
+    group.add_argument('--lm-dict', default=None, metavar='FILE',
+                       help='the dict of the language model for the target language')
+    group.add_argument('--lm-name', default=None,
+                       help='the name of the language model for the target language')
+    group.add_argument('--lm-bpe-code', default=None, metavar='FILE',
+                       help='the bpe code for the language model for the target language')
+    group.add_argument('--data-dir-name', default=None,
+                       help='name of data directory')
+    group.add_argument('--lenpen', default=1, nargs='+', type=float,
+                       help='length penalty: <1.0 favors shorter, >1.0 favors longer sentences')
+    group.add_argument('--score-dict-dir', default=None,
+                       help='the directory with dictionaries for the scoring models')
+    group.add_argument('--right-to-left1', action='store_true',
+                       help='whether the first model group is a right to left model')
+    group.add_argument('--right-to-left2', action='store_true',
+                       help='whether the second model group is a right to left model')
+    group.add_argument('--remove-bpe', '--post-process', default='@@ ',
+                       help='the bpe symbol, used for the bitext and LM')
+    group.add_argument('--prefix-len', default=None, type=int,
+                       help='the length of the target prefix to use in rescoring (in terms of words wo bpe)')
+    group.add_argument('--sampling', action='store_true',
+                       help='use sampling instead of beam search for generating n best list')
+    group.add_argument('--diff-bpe', action='store_true',
+                       help='bpe for rescoring and nbest list not the same')
+    group.add_argument('--rescore-bpe-code', default=None,
+                       help='bpe code for rescoring models')
+    group.add_argument('--nbest-list', default=None,
+                       help='use predefined nbest list in interactive.py format')
+    group.add_argument('--write-hypos', default=None,
+                       help='filename prefix to write hypos to')
+    group.add_argument('--ref-translation', default=None,
+                       help='reference translation to use with nbest list from interactive.py')
+    group.add_argument('--backwards-score-dict-dir', default=None,
+                       help='the directory with dictionaries for the backwards model,'
+                            'if None then it is assumed the fw and backwards models share dictionaries')
+
+    # extra scaling args
+    group.add_argument('--gen-model-name', default=None,
+                       help='the name of the models that generated the nbest list')
+    group.add_argument('--model1-name', default=None,
+                       help='the name of the set for model1 group ')
+    group.add_argument('--model2-name', default=None,
+                       help='the name of the set for model2 group')
+    group.add_argument('--shard-id', default=0, type=int,
+                       help='the id of the shard to generate')
+    group.add_argument('--num-shards', default=1, type=int,
+                       help='the number of shards to generate across')
+    group.add_argument('--all-shards', action='store_true',
+                       help='use all shards')
+    group.add_argument('--target-prefix-frac', default=None, type=float,
+                       help='the fraction of the target prefix to use in rescoring (in terms of words wo bpe)')
+    group.add_argument('--source-prefix-frac', default=None, type=float,
+                       help='the fraction of the source prefix to use in rescoring (in terms of words wo bpe)')
+    group.add_argument('--normalize', action='store_true',
+                       help='whether to normalize by src and target len')
+    # fmt: on
+    return group
+
+
+def add_tuning_args(parser):
+    group = parser.add_argument_group("Tuning")
+
+    group.add_argument(
+        "--lower-bound",
+        default=[-0.7],
+        nargs="+",
+        type=float,
+        help="lower bound of search space",
+    )
+    group.add_argument(
+        "--upper-bound",
+        default=[3],
+        nargs="+",
+        type=float,
+        help="upper bound of search space",
+    )
+    group.add_argument(
+        "--tune-param",
+        default=["lenpen"],
+        nargs="+",
+        choices=["lenpen", "weight1", "weight2", "weight3"],
+        help="the parameter(s) to tune",
+    )
+    group.add_argument(
+        "--tune-subset",
+        default="valid",
+        choices=["valid", "test", "train"],
+        help="the subset to tune on ",
+    )
+    group.add_argument(
+        "--num-trials",
+        default=1000,
+        type=int,
+        help="number of trials to do for random search",
+    )
+    group.add_argument(
+        "--share-weights", action="store_true", help="share weight2 and weight 3"
+    )
+    return group
diff --git a/fairseq-0.10.2/examples/noisychannel/rerank_score_bw.py b/fairseq-0.10.2/examples/noisychannel/rerank_score_bw.py
new file mode 100644
index 0000000000000000000000000000000000000000..895673b1ccbc2c0a12c3dbd3a09d95b1aa43f403
--- /dev/null
+++ b/fairseq-0.10.2/examples/noisychannel/rerank_score_bw.py
@@ -0,0 +1,143 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import os
+from contextlib import redirect_stdout
+
+from fairseq import options
+from fairseq_cli import generate
+
+from . import rerank_options, rerank_utils
+
+
+def score_bw(args):
+    if args.backwards1:
+        scorer1_src = args.target_lang
+        scorer1_tgt = args.source_lang
+    else:
+        scorer1_src = args.source_lang
+        scorer1_tgt = args.target_lang
+
+    if args.score_model2 is not None:
+        if args.backwards2:
+            scorer2_src = args.target_lang
+            scorer2_tgt = args.source_lang
+        else:
+            scorer2_src = args.source_lang
+            scorer2_tgt = args.target_lang
+
+    rerank1_is_gen = (
+        args.gen_model == args.score_model1 and args.source_prefix_frac is None
+    )
+    rerank2_is_gen = (
+        args.gen_model == args.score_model2 and args.source_prefix_frac is None
+    )
+
+    (
+        pre_gen,
+        left_to_right_preprocessed_dir,
+        right_to_left_preprocessed_dir,
+        backwards_preprocessed_dir,
+        lm_preprocessed_dir,
+    ) = rerank_utils.get_directories(
+        args.data_dir_name,
+        args.num_rescore,
+        args.gen_subset,
+        args.gen_model_name,
+        args.shard_id,
+        args.num_shards,
+        args.sampling,
+        args.prefix_len,
+        args.target_prefix_frac,
+        args.source_prefix_frac,
+    )
+
+    score1_file = rerank_utils.rescore_file_name(
+        pre_gen,
+        args.prefix_len,
+        args.model1_name,
+        target_prefix_frac=args.target_prefix_frac,
+        source_prefix_frac=args.source_prefix_frac,
+        backwards=args.backwards1,
+    )
+
+    if args.score_model2 is not None:
+        score2_file = rerank_utils.rescore_file_name(
+            pre_gen,
+            args.prefix_len,
+            args.model2_name,
+            target_prefix_frac=args.target_prefix_frac,
+            source_prefix_frac=args.source_prefix_frac,
+            backwards=args.backwards2,
+        )
+
+    if args.right_to_left1:
+        rerank_data1 = right_to_left_preprocessed_dir
+    elif args.backwards1:
+        rerank_data1 = backwards_preprocessed_dir
+    else:
+        rerank_data1 = left_to_right_preprocessed_dir
+
+    gen_param = ["--batch-size", str(128), "--score-reference", "--gen-subset", "train"]
+    if not rerank1_is_gen and not os.path.isfile(score1_file):
+        print("STEP 4: score the translations for model 1")
+
+        model_param1 = [
+            "--path",
+            args.score_model1,
+            "--source-lang",
+            scorer1_src,
+            "--target-lang",
+            scorer1_tgt,
+        ]
+        gen_model1_param = [rerank_data1] + gen_param + model_param1
+
+        gen_parser = options.get_generation_parser()
+        input_args = options.parse_args_and_arch(gen_parser, gen_model1_param)
+
+        with open(score1_file, "w") as f:
+            with redirect_stdout(f):
+                generate.main(input_args)
+
+    if (
+        args.score_model2 is not None
+        and not os.path.isfile(score2_file)
+        and not rerank2_is_gen
+    ):
+        print("STEP 4: score the translations for model 2")
+
+        if args.right_to_left2:
+            rerank_data2 = right_to_left_preprocessed_dir
+        elif args.backwards2:
+            rerank_data2 = backwards_preprocessed_dir
+        else:
+            rerank_data2 = left_to_right_preprocessed_dir
+
+        model_param2 = [
+            "--path",
+            args.score_model2,
+            "--source-lang",
+            scorer2_src,
+            "--target-lang",
+            scorer2_tgt,
+        ]
+        gen_model2_param = [rerank_data2] + gen_param + model_param2
+
+        gen_parser = options.get_generation_parser()
+        input_args = options.parse_args_and_arch(gen_parser, gen_model2_param)
+
+        with open(score2_file, "w") as f:
+            with redirect_stdout(f):
+                generate.main(input_args)
+
+
+def cli_main():
+    parser = rerank_options.get_reranking_parser()
+    args = options.parse_args_and_arch(parser)
+    score_bw(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/fairseq-0.10.2/examples/noisychannel/rerank_tune.py b/fairseq-0.10.2/examples/noisychannel/rerank_tune.py
new file mode 100644
index 0000000000000000000000000000000000000000..1be71744a340534400bea3333aed66ff458111a4
--- /dev/null
+++ b/fairseq-0.10.2/examples/noisychannel/rerank_tune.py
@@ -0,0 +1,102 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import random
+
+import numpy as np
+from fairseq import options
+
+from . import rerank, rerank_options
+
+
+def random_search(args):
+    param_values = []
+    tuneable_parameters = ["lenpen", "weight1", "weight2", "weight3"]
+    initial_params = [args.lenpen, args.weight1, args.weight2, args.weight3]
+    for i, elem in enumerate(initial_params):
+        if type(elem) is not list:
+            initial_params[i] = [elem]
+        else:
+            initial_params[i] = elem
+
+    tune_parameters = args.tune_param.copy()
+    for i in range(len(args.tune_param)):
+        assert args.upper_bound[i] >= args.lower_bound[i]
+        index = tuneable_parameters.index(args.tune_param[i])
+        del tuneable_parameters[index]
+        del initial_params[index]
+
+    tune_parameters += tuneable_parameters
+    param_values += initial_params
+    random.seed(args.seed)
+
+    random_params = np.array(
+        [
+            [
+                random.uniform(args.lower_bound[i], args.upper_bound[i])
+                for i in range(len(args.tune_param))
+            ]
+            for k in range(args.num_trials)
+        ]
+    )
+    set_params = np.array(
+        [
+            [initial_params[i][0] for i in range(len(tuneable_parameters))]
+            for k in range(args.num_trials)
+        ]
+    )
+    random_params = np.concatenate((random_params, set_params), 1)
+
+    rerank_args = vars(args).copy()
+    if args.nbest_list:
+        rerank_args["gen_subset"] = "test"
+    else:
+        rerank_args["gen_subset"] = args.tune_subset
+
+    for k in range(len(tune_parameters)):
+        rerank_args[tune_parameters[k]] = list(random_params[:, k])
+
+    if args.share_weights:
+        k = tune_parameters.index("weight2")
+        rerank_args["weight3"] = list(random_params[:, k])
+
+    rerank_args = argparse.Namespace(**rerank_args)
+    best_lenpen, best_weight1, best_weight2, best_weight3, best_score = rerank.rerank(
+        rerank_args
+    )
+    rerank_args = vars(args).copy()
+    rerank_args["lenpen"] = [best_lenpen]
+    rerank_args["weight1"] = [best_weight1]
+    rerank_args["weight2"] = [best_weight2]
+    rerank_args["weight3"] = [best_weight3]
+
+    # write the hypothesis from the valid set from the best trial
+
+    if args.gen_subset != "valid":
+        rerank_args["gen_subset"] = "valid"
+        rerank_args = argparse.Namespace(**rerank_args)
+        rerank.rerank(rerank_args)
+
+    # test with the best hyperparameters on gen subset
+    rerank_args = vars(args).copy()
+    rerank_args["gen_subset"] = args.gen_subset
+    rerank_args["lenpen"] = [best_lenpen]
+    rerank_args["weight1"] = [best_weight1]
+    rerank_args["weight2"] = [best_weight2]
+    rerank_args["weight3"] = [best_weight3]
+    rerank_args = argparse.Namespace(**rerank_args)
+    rerank.rerank(rerank_args)
+
+
+def cli_main():
+    parser = rerank_options.get_tuning_parser()
+    args = options.parse_args_and_arch(parser)
+
+    random_search(args)
+
+
+if __name__ == "__main__":
+    cli_main()
diff --git a/fairseq-0.10.2/examples/noisychannel/rerank_utils.py b/fairseq-0.10.2/examples/noisychannel/rerank_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c6bf1b1afbb089cf5e84f720eb7a067479fbcbc
--- /dev/null
+++ b/fairseq-0.10.2/examples/noisychannel/rerank_utils.py
@@ -0,0 +1,850 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+import os
+import re
+import subprocess
+from contextlib import redirect_stdout
+
+from fairseq import options
+from fairseq_cli import eval_lm, preprocess
+
+
+def reprocess(fle):
+    # takes in a file of generate.py translation generate_output
+    # returns a source dict and hypothesis dict, where keys are the ID num (as a string)
+    # and values and the corresponding source and translation. There may be several translations
+    # per source, so the values for hypothesis_dict are lists.
+    # parses output of generate.py
+
+    with open(fle, "r") as f:
+        txt = f.read()
+
+    """reprocess generate.py output"""
+    p = re.compile(r"[STHP][-]\d+\s*")
+    hp = re.compile(r"(\s*[-]?\d+[.]?\d+\s*)|(\s*(-inf)\s*)")
+    source_dict = {}
+    hypothesis_dict = {}
+    score_dict = {}
+    target_dict = {}
+    pos_score_dict = {}
+    lines = txt.split("\n")
+
+    for line in lines:
+        line += "\n"
+        prefix = re.search(p, line)
+        if prefix is not None:
+            assert len(prefix.group()) > 2, "prefix id not found"
+            _, j = prefix.span()
+            id_num = prefix.group()[2:]
+            id_num = int(id_num)
+            line_type = prefix.group()[0]
+            if line_type == "H":
+                h_txt = line[j:]
+                hypo = re.search(hp, h_txt)
+                assert (
+                    hypo is not None
+                ), "regular expression failed to find the hypothesis scoring"
+                _, i = hypo.span()
+                score = hypo.group()
+                if id_num in hypothesis_dict:
+                    hypothesis_dict[id_num].append(h_txt[i:])
+                    score_dict[id_num].append(float(score))
+                else:
+                    hypothesis_dict[id_num] = [h_txt[i:]]
+                    score_dict[id_num] = [float(score)]
+
+            elif line_type == "S":
+                source_dict[id_num] = line[j:]
+            elif line_type == "T":
+                target_dict[id_num] = line[j:]
+            elif line_type == "P":
+                pos_scores = (line[j:]).split()
+                pos_scores = [float(x) for x in pos_scores]
+                if id_num in pos_score_dict:
+                    pos_score_dict[id_num].append(pos_scores)
+                else:
+                    pos_score_dict[id_num] = [pos_scores]
+
+    return source_dict, hypothesis_dict, score_dict, target_dict, pos_score_dict
+
+
+def reprocess_nbest(fle):
+    """reprocess interactive.py output"""
+    with open(fle, "r") as f:
+        txt = f.read()
+
+    source_dict = {}
+    hypothesis_dict = {}
+    score_dict = {}
+    target_dict = {}
+    pos_score_dict = {}
+    lines = txt.split("\n")
+
+    hp = re.compile(r"[-]?\d+[.]?\d+")
+    j = -1
+
+    for _i, line in enumerate(lines):
+        line += "\n"
+        line_type = line[0]
+
+        if line_type == "H":
+            hypo = re.search(hp, line)
+            _, start_index = hypo.span()
+            score = hypo.group()
+            if j in score_dict:
+                score_dict[j].append(float(score))
+                hypothesis_dict[j].append(line[start_index:].strip("\t"))
+            else:
+                score_dict[j] = [float(score)]
+                hypothesis_dict[j] = [line[start_index:].strip("\t")]
+        elif line_type == "O":
+            j += 1
+            source_dict[j] = line[2:]
+            # we don't have the targets for interactive.py
+            target_dict[j] = "filler"
+
+        elif line_type == "P":
+            pos_scores = [float(pos_score) for pos_score in line.split()[1:]]
+            if j in pos_score_dict:
+                pos_score_dict[j].append(pos_scores)
+            else:
+                pos_score_dict[j] = [pos_scores]
+
+    assert source_dict.keys() == hypothesis_dict.keys()
+    assert source_dict.keys() == pos_score_dict.keys()
+    assert source_dict.keys() == score_dict.keys()
+
+    return source_dict, hypothesis_dict, score_dict, target_dict, pos_score_dict
+
+
+def write_reprocessed(
+    sources,
+    hypos,
+    targets,
+    source_outfile,
+    hypo_outfile,
+    target_outfile,
+    right_to_left=False,
+    prefix_len=None,
+    bpe_symbol=None,
+    target_prefix_frac=None,
+    source_prefix_frac=None,
+):
+
+    """writes nbest hypothesis for rescoring"""
+    assert not (
+        prefix_len is not None and target_prefix_frac is not None
+    ), "in writing reprocessed, only one type of prefix may be used"
+    assert not (
+        prefix_len is not None and source_prefix_frac is not None
+    ), "in writing reprocessed, only one type of prefix may be used"
+    assert not (
+        target_prefix_frac is not None and source_prefix_frac is not None
+    ), "in writing reprocessed, only one type of prefix may be used"
+
+    with open(source_outfile, "w") as source_file, open(
+        hypo_outfile, "w"
+    ) as hypo_file, open(target_outfile, "w") as target_file:
+
+        assert len(sources) == len(hypos), "sources and hypos list length mismatch"
+        if right_to_left:
+            for i in range(len(sources)):
+                for j in range(len(hypos[i])):
+                    if prefix_len is None:
+                        hypo_file.write(make_right_to_left(hypos[i][j]) + "\n")
+                    else:
+                        raise NotImplementedError()
+                    source_file.write(make_right_to_left(sources[i]) + "\n")
+                    target_file.write(make_right_to_left(targets[i]) + "\n")
+        else:
+            for i in sorted(sources.keys()):
+                for j in range(len(hypos[i])):
+                    if prefix_len is not None:
+                        shortened = (
+                            get_prefix_no_bpe(hypos[i][j], bpe_symbol, prefix_len)
+                            + "\n"
+                        )
+                        hypo_file.write(shortened)
+                        source_file.write(sources[i])
+                        target_file.write(targets[i])
+                    elif target_prefix_frac is not None:
+                        num_words, shortened, num_bpe_tokens = calc_length_from_frac(
+                            hypos[i][j], target_prefix_frac, bpe_symbol
+                        )
+                        shortened += "\n"
+                        hypo_file.write(shortened)
+                        source_file.write(sources[i])
+                        target_file.write(targets[i])
+                    elif source_prefix_frac is not None:
+                        num_words, shortened, num_bpe_tokensn = calc_length_from_frac(
+                            sources[i], source_prefix_frac, bpe_symbol
+                        )
+                        shortened += "\n"
+                        hypo_file.write(hypos[i][j])
+                        source_file.write(shortened)
+                        target_file.write(targets[i])
+                    else:
+                        hypo_file.write(hypos[i][j])
+                        source_file.write(sources[i])
+                        target_file.write(targets[i])
+
+
+def calc_length_from_frac(bpe_sentence, prefix_frac, bpe_symbol):
+    # return number of words, (not bpe tokens) that we want
+    no_bpe_sen = remove_bpe(bpe_sentence, bpe_symbol)
+    len_sen = len(no_bpe_sen.split())
+
+    num_words = math.ceil(len_sen * prefix_frac)
+    prefix = get_prefix_no_bpe(bpe_sentence, bpe_symbol, num_words)
+    num_bpe_tokens = len(prefix.split())
+    return num_words, prefix, num_bpe_tokens
+
+
+def get_prefix(sentence, prefix_len):
+    """assuming no bpe, gets the prefix of the sentence with prefix_len words"""
+    tokens = sentence.strip("\n").split()
+    if prefix_len >= len(tokens):
+        return sentence.strip("\n")
+    else:
+        return " ".join(tokens[:prefix_len])
+
+
+def get_prefix_no_bpe(sentence, bpe_symbol, prefix_len):
+    if bpe_symbol is None:
+        return get_prefix(sentence, prefix_len)
+    else:
+        return " ".join(get_prefix_from_len(sentence.split(), bpe_symbol, prefix_len))
+
+
+def get_prefix_from_len(sentence, bpe_symbol, prefix_len):
+    """get the prefix of sentence with bpe, with prefix len in terms of words, not bpe tokens"""
+    bpe_count = sum([bpe_symbol.strip(" ") in t for t in sentence[:prefix_len]])
+    if bpe_count == 0:
+        return sentence[:prefix_len]
+    else:
+        return sentence[:prefix_len] + get_prefix_from_len(
+            sentence[prefix_len:], bpe_symbol, bpe_count
+        )
+
+
+def get_num_bpe_tokens_from_len(sentence, bpe_symbol, prefix_len):
+    """given a prefix length in terms of words, return the number of bpe tokens"""
+    prefix = get_prefix_no_bpe(sentence, bpe_symbol, prefix_len)
+    assert len(remove_bpe(prefix, bpe_symbol).split()) <= prefix_len
+    return len(prefix.split(" "))
+
+
+def make_right_to_left(line):
+    tokens = line.split()
+    tokens.reverse()
+    new_line = " ".join(tokens)
+    return new_line
+
+
+def remove_bpe(line, bpe_symbol):
+    line = line.replace("\n", "")
+    line = (line + " ").replace(bpe_symbol, "").rstrip()
+    return line + ("\n")
+
+
+def remove_bpe_dict(pred_dict, bpe_symbol):
+    new_dict = {}
+    for i in pred_dict:
+        if type(pred_dict[i]) == list:
+            new_list = [remove_bpe(elem, bpe_symbol) for elem in pred_dict[i]]
+            new_dict[i] = new_list
+        else:
+            new_dict[i] = remove_bpe(pred_dict[i], bpe_symbol)
+    return new_dict
+
+
+def parse_bleu_scoring(line):
+    p = re.compile(r"(BLEU4 = )\d+[.]\d+")
+    res = re.search(p, line)
+    assert res is not None, line
+    return float(res.group()[8:])
+
+
+def get_full_from_prefix(hypo_prefix, hypos):
+    """given a hypo prefix, recover the first hypo from the list of complete hypos beginning with that prefix"""
+    for hypo in hypos:
+        hypo_prefix = hypo_prefix.strip("\n")
+        len_prefix = len(hypo_prefix)
+        if hypo[:len_prefix] == hypo_prefix:
+            return hypo
+    # no match found
+    raise Exception()
+
+
+def get_score(
+    a,
+    b,
+    c,
+    target_len,
+    bitext_score1,
+    bitext_score2=None,
+    lm_score=None,
+    lenpen=None,
+    src_len=None,
+    tgt_len=None,
+    bitext1_backwards=False,
+    bitext2_backwards=False,
+    normalize=False,
+):
+    if bitext1_backwards:
+        bitext1_norm = src_len
+    else:
+        bitext1_norm = tgt_len
+    if bitext_score2 is not None:
+        if bitext2_backwards:
+            bitext2_norm = src_len
+        else:
+            bitext2_norm = tgt_len
+    else:
+        bitext2_norm = 1
+        bitext_score2 = 0
+    if normalize:
+        score = (
+            a * bitext_score1 / bitext1_norm
+            + b * bitext_score2 / bitext2_norm
+            + c * lm_score / src_len
+        )
+    else:
+        score = a * bitext_score1 + b * bitext_score2 + c * lm_score
+
+    if lenpen is not None:
+        score /= (target_len) ** float(lenpen)
+
+    return score
+
+
+class BitextOutput(object):
+    def __init__(
+        self,
+        output_file,
+        backwards,
+        right_to_left,
+        bpe_symbol,
+        prefix_len=None,
+        target_prefix_frac=None,
+        source_prefix_frac=None,
+    ):
+        """process output from rescoring"""
+        source, hypo, score, target, pos_score = reprocess(output_file)
+        if backwards:
+            self.hypo_fracs = source_prefix_frac
+        else:
+            self.hypo_fracs = target_prefix_frac
+
+        # remove length penalty so we can use raw scores
+        score, num_bpe_tokens = get_score_from_pos(
+            pos_score, prefix_len, hypo, bpe_symbol, self.hypo_fracs, backwards
+        )
+        source_lengths = {}
+        target_lengths = {}
+
+        assert hypo.keys() == source.keys(), "key mismatch"
+        if backwards:
+            tmp = hypo
+            hypo = source
+            source = tmp
+        for i in source:
+            # since we are reranking, there should only be one hypo per source sentence
+            if backwards:
+                len_src = len(source[i][0].split())
+                # record length without <eos>
+                if len_src == num_bpe_tokens[i][0] - 1:
+                    source_lengths[i] = num_bpe_tokens[i][0] - 1
+                else:
+                    source_lengths[i] = num_bpe_tokens[i][0]
+
+                target_lengths[i] = len(hypo[i].split())
+
+                source[i] = remove_bpe(source[i][0], bpe_symbol)
+                target[i] = remove_bpe(target[i], bpe_symbol)
+                hypo[i] = remove_bpe(hypo[i], bpe_symbol)
+
+                score[i] = float(score[i][0])
+                pos_score[i] = pos_score[i][0]
+
+            else:
+                len_tgt = len(hypo[i][0].split())
+                # record length without <eos>
+                if len_tgt == num_bpe_tokens[i][0] - 1:
+                    target_lengths[i] = num_bpe_tokens[i][0] - 1
+                else:
+                    target_lengths[i] = num_bpe_tokens[i][0]
+
+                source_lengths[i] = len(source[i].split())
+
+                if right_to_left:
+                    source[i] = remove_bpe(make_right_to_left(source[i]), bpe_symbol)
+                    target[i] = remove_bpe(make_right_to_left(target[i]), bpe_symbol)
+                    hypo[i] = remove_bpe(make_right_to_left(hypo[i][0]), bpe_symbol)
+                    score[i] = float(score[i][0])
+                    pos_score[i] = pos_score[i][0]
+                else:
+                    assert (
+                        len(hypo[i]) == 1
+                    ), "expected only one hypothesis per source sentence"
+                    source[i] = remove_bpe(source[i], bpe_symbol)
+                    target[i] = remove_bpe(target[i], bpe_symbol)
+                    hypo[i] = remove_bpe(hypo[i][0], bpe_symbol)
+                    score[i] = float(score[i][0])
+                    pos_score[i] = pos_score[i][0]
+
+        self.rescore_source = source
+        self.rescore_hypo = hypo
+        self.rescore_score = score
+        self.rescore_target = target
+        self.rescore_pos_score = pos_score
+        self.backwards = backwards
+        self.right_to_left = right_to_left
+        self.target_lengths = target_lengths
+        self.source_lengths = source_lengths
+
+
+class BitextOutputFromGen(object):
+    def __init__(
+        self,
+        predictions_bpe_file,
+        bpe_symbol=None,
+        nbest=False,
+        prefix_len=None,
+        target_prefix_frac=None,
+    ):
+        if nbest:
+            (
+                pred_source,
+                pred_hypo,
+                pred_score,
+                pred_target,
+                pred_pos_score,
+            ) = reprocess_nbest(predictions_bpe_file)
+        else:
+            pred_source, pred_hypo, pred_score, pred_target, pred_pos_score = reprocess(
+                predictions_bpe_file
+            )
+
+        assert len(pred_source) == len(pred_hypo)
+        assert len(pred_source) == len(pred_score)
+        assert len(pred_source) == len(pred_target)
+        assert len(pred_source) == len(pred_pos_score)
+
+        # remove length penalty so we can use raw scores
+        pred_score, num_bpe_tokens = get_score_from_pos(
+            pred_pos_score, prefix_len, pred_hypo, bpe_symbol, target_prefix_frac, False
+        )
+
+        self.source = pred_source
+        self.target = pred_target
+        self.score = pred_score
+        self.pos_score = pred_pos_score
+        self.hypo = pred_hypo
+        self.target_lengths = {}
+        self.source_lengths = {}
+
+        self.no_bpe_source = remove_bpe_dict(pred_source.copy(), bpe_symbol)
+        self.no_bpe_hypo = remove_bpe_dict(pred_hypo.copy(), bpe_symbol)
+        self.no_bpe_target = remove_bpe_dict(pred_target.copy(), bpe_symbol)
+
+        # indexes to match those from the rescoring models
+        self.rescore_source = {}
+        self.rescore_target = {}
+        self.rescore_pos_score = {}
+        self.rescore_hypo = {}
+        self.rescore_score = {}
+        self.num_hypos = {}
+        self.backwards = False
+        self.right_to_left = False
+
+        index = 0
+
+        for i in sorted(pred_source.keys()):
+            for j in range(len(pred_hypo[i])):
+
+                self.target_lengths[index] = len(self.hypo[i][j].split())
+                self.source_lengths[index] = len(self.source[i].split())
+
+                self.rescore_source[index] = self.no_bpe_source[i]
+                self.rescore_target[index] = self.no_bpe_target[i]
+                self.rescore_hypo[index] = self.no_bpe_hypo[i][j]
+                self.rescore_score[index] = float(pred_score[i][j])
+                self.rescore_pos_score[index] = pred_pos_score[i][j]
+                self.num_hypos[index] = len(pred_hypo[i])
+                index += 1
+
+
+def get_score_from_pos(
+    pos_score_dict, prefix_len, hypo_dict, bpe_symbol, hypo_frac, backwards
+):
+    score_dict = {}
+    num_bpe_tokens_dict = {}
+    assert prefix_len is None or hypo_frac is None
+    for key in pos_score_dict:
+        score_dict[key] = []
+        num_bpe_tokens_dict[key] = []
+        for i in range(len(pos_score_dict[key])):
+            if prefix_len is not None and not backwards:
+                num_bpe_tokens = get_num_bpe_tokens_from_len(
+                    hypo_dict[key][i], bpe_symbol, prefix_len
+                )
+                score_dict[key].append(sum(pos_score_dict[key][i][:num_bpe_tokens]))
+                num_bpe_tokens_dict[key].append(num_bpe_tokens)
+            elif hypo_frac is not None:
+                num_words, shortened, hypo_prefix_len = calc_length_from_frac(
+                    hypo_dict[key][i], hypo_frac, bpe_symbol
+                )
+                score_dict[key].append(sum(pos_score_dict[key][i][:hypo_prefix_len]))
+                num_bpe_tokens_dict[key].append(hypo_prefix_len)
+            else:
+                score_dict[key].append(sum(pos_score_dict[key][i]))
+                num_bpe_tokens_dict[key].append(len(pos_score_dict[key][i]))
+    return score_dict, num_bpe_tokens_dict
+
+
+class LMOutput(object):
+    def __init__(
+        self,
+        lm_score_file,
+        lm_dict=None,
+        prefix_len=None,
+        bpe_symbol=None,
+        target_prefix_frac=None,
+    ):
+        (
+            lm_sentences,
+            lm_sen_scores,
+            lm_sen_pos_scores,
+            lm_no_bpe_sentences,
+            lm_bpe_tokens,
+        ) = parse_lm(
+            lm_score_file,
+            prefix_len=prefix_len,
+            bpe_symbol=bpe_symbol,
+            target_prefix_frac=target_prefix_frac,
+        )
+
+        self.sentences = lm_sentences
+        self.score = lm_sen_scores
+        self.pos_score = lm_sen_pos_scores
+        self.lm_dict = lm_dict
+        self.no_bpe_sentences = lm_no_bpe_sentences
+        self.bpe_tokens = lm_bpe_tokens
+
+
+def parse_lm(input_file, prefix_len=None, bpe_symbol=None, target_prefix_frac=None):
+    """parse output of eval_lm"""
+    with open(input_file, "r") as f:
+        text = f.readlines()
+        text = text[7:]
+        cleaned_text = text[:-2]
+
+        sentences = {}
+        sen_scores = {}
+        sen_pos_scores = {}
+        no_bpe_sentences = {}
+        num_bpe_tokens_dict = {}
+        for _i, line in enumerate(cleaned_text):
+            tokens = line.split()
+            if tokens[0].isdigit():
+                line_id = int(tokens[0])
+                scores = [float(x[1:-1]) for x in tokens[2::2]]
+                sentences[line_id] = " ".join(tokens[1::2][:-1]) + "\n"
+                if bpe_symbol is not None:
+                    # exclude <eos> symbol to match output from generate.py
+                    bpe_sen = " ".join(tokens[1::2][:-1]) + "\n"
+                    no_bpe_sen = remove_bpe(bpe_sen, bpe_symbol)
+                    no_bpe_sentences[line_id] = no_bpe_sen
+
+                if prefix_len is not None:
+                    num_bpe_tokens = get_num_bpe_tokens_from_len(
+                        bpe_sen, bpe_symbol, prefix_len
+                    )
+                    sen_scores[line_id] = sum(scores[:num_bpe_tokens])
+                    num_bpe_tokens_dict[line_id] = num_bpe_tokens
+                elif target_prefix_frac is not None:
+                    num_words, shortened, target_prefix_len = calc_length_from_frac(
+                        bpe_sen, target_prefix_frac, bpe_symbol
+                    )
+                    sen_scores[line_id] = sum(scores[:target_prefix_len])
+                    num_bpe_tokens_dict[line_id] = target_prefix_len
+                else:
+                    sen_scores[line_id] = sum(scores)
+                    num_bpe_tokens_dict[line_id] = len(scores)
+
+                sen_pos_scores[line_id] = scores
+
+    return sentences, sen_scores, sen_pos_scores, no_bpe_sentences, num_bpe_tokens_dict
+
+
+def get_directories(
+    data_dir_name,
+    num_rescore,
+    gen_subset,
+    fw_name,
+    shard_id,
+    num_shards,
+    sampling=False,
+    prefix_len=None,
+    target_prefix_frac=None,
+    source_prefix_frac=None,
+):
+    nbest_file_id = (
+        "nbest_"
+        + str(num_rescore)
+        + "_subset_"
+        + gen_subset
+        + "_fw_name_"
+        + fw_name
+        + "_shard_"
+        + str(shard_id)
+        + "_of_"
+        + str(num_shards)
+    )
+
+    if sampling:
+        nbest_file_id += "_sampling"
+
+    # the directory containing all information for this nbest list
+    pre_gen = (
+        os.path.join(os.path.dirname(__file__))
+        + "/rerank_data/"
+        + data_dir_name
+        + "/"
+        + nbest_file_id
+    )
+    # the directory to store the preprocessed nbest list, for left to right rescoring
+    left_to_right_preprocessed_dir = pre_gen + "/left_to_right_preprocessed"
+    if source_prefix_frac is not None:
+        left_to_right_preprocessed_dir = (
+            left_to_right_preprocessed_dir + "/prefix_frac" + str(source_prefix_frac)
+        )
+    # the directory to store the preprocessed nbest list, for right to left rescoring
+    right_to_left_preprocessed_dir = pre_gen + "/right_to_left_preprocessed"
+    # the directory to store the preprocessed nbest list, for backwards rescoring
+    backwards_preprocessed_dir = pre_gen + "/backwards"
+    if target_prefix_frac is not None:
+        backwards_preprocessed_dir = (
+            backwards_preprocessed_dir + "/prefix_frac" + str(target_prefix_frac)
+        )
+    elif prefix_len is not None:
+        backwards_preprocessed_dir = (
+            backwards_preprocessed_dir + "/prefix_" + str(prefix_len)
+        )
+
+    # the directory to store the preprocessed nbest list, for rescoring with P(T)
+    lm_preprocessed_dir = pre_gen + "/lm_preprocessed"
+
+    return (
+        pre_gen,
+        left_to_right_preprocessed_dir,
+        right_to_left_preprocessed_dir,
+        backwards_preprocessed_dir,
+        lm_preprocessed_dir,
+    )
+
+
+def lm_scoring(
+    preprocess_directory,
+    bpe_status,
+    gen_output,
+    pre_gen,
+    cur_lm_dict,
+    cur_lm_name,
+    cur_language_model,
+    cur_lm_bpe_code,
+    batch_size,
+    lm_score_file,
+    target_lang,
+    source_lang,
+    prefix_len=None,
+):
+    if prefix_len is not None:
+        assert (
+            bpe_status == "different"
+        ), "bpe status must be different to use prefix len"
+    if bpe_status == "no bpe":
+        # run lm on output without bpe
+        write_reprocessed(
+            gen_output.no_bpe_source,
+            gen_output.no_bpe_hypo,
+            gen_output.no_bpe_target,
+            pre_gen + "/rescore_data_no_bpe.de",
+            pre_gen + "/rescore_data_no_bpe.en",
+            pre_gen + "/reference_file_no_bpe",
+        )
+
+        preprocess_lm_param = [
+            "--only-source",
+            "--trainpref",
+            pre_gen + "/rescore_data_no_bpe." + target_lang,
+            "--srcdict",
+            cur_lm_dict,
+            "--destdir",
+            preprocess_directory,
+        ]
+        preprocess_parser = options.get_preprocessing_parser()
+        input_args = preprocess_parser.parse_args(preprocess_lm_param)
+        preprocess.main(input_args)
+
+        eval_lm_param = [
+            preprocess_directory,
+            "--path",
+            cur_language_model,
+            "--output-word-probs",
+            "--batch-size",
+            str(batch_size),
+            "--max-tokens",
+            "1024",
+            "--sample-break-mode",
+            "eos",
+            "--gen-subset",
+            "train",
+        ]
+
+        eval_lm_parser = options.get_eval_lm_parser()
+        input_args = options.parse_args_and_arch(eval_lm_parser, eval_lm_param)
+
+        with open(lm_score_file, "w") as f:
+            with redirect_stdout(f):
+                eval_lm.main(input_args)
+
+    elif bpe_status == "shared":
+        preprocess_lm_param = [
+            "--only-source",
+            "--trainpref",
+            pre_gen + "/rescore_data." + target_lang,
+            "--srcdict",
+            cur_lm_dict,
+            "--destdir",
+            preprocess_directory,
+        ]
+        preprocess_parser = options.get_preprocessing_parser()
+        input_args = preprocess_parser.parse_args(preprocess_lm_param)
+        preprocess.main(input_args)
+
+        eval_lm_param = [
+            preprocess_directory,
+            "--path",
+            cur_language_model,
+            "--output-word-probs",
+            "--batch-size",
+            str(batch_size),
+            "--sample-break-mode",
+            "eos",
+            "--gen-subset",
+            "train",
+        ]
+
+        eval_lm_parser = options.get_eval_lm_parser()
+        input_args = options.parse_args_and_arch(eval_lm_parser, eval_lm_param)
+
+        with open(lm_score_file, "w") as f:
+            with redirect_stdout(f):
+                eval_lm.main(input_args)
+
+    elif bpe_status == "different":
+        rescore_file = pre_gen + "/rescore_data_no_bpe"
+        rescore_bpe = pre_gen + "/rescore_data_new_bpe"
+
+        rescore_file += "."
+        rescore_bpe += "."
+
+        write_reprocessed(
+            gen_output.no_bpe_source,
+            gen_output.no_bpe_hypo,
+            gen_output.no_bpe_target,
+            rescore_file + source_lang,
+            rescore_file + target_lang,
+            pre_gen + "/reference_file_no_bpe",
+            bpe_symbol=None,
+        )
+
+        # apply LM bpe to nbest list
+        bpe_src_param = [
+            "-c",
+            cur_lm_bpe_code,
+            "--input",
+            rescore_file + target_lang,
+            "--output",
+            rescore_bpe + target_lang,
+        ]
+        subprocess.call(
+            [
+                "python",
+                os.path.join(
+                    os.path.dirname(__file__), "subword-nmt/subword_nmt/apply_bpe.py"
+                ),
+            ]
+            + bpe_src_param,
+            shell=False,
+        )
+        # uncomment to use fastbpe instead of subword-nmt bpe
+        # bpe_src_param = [rescore_bpe+target_lang, rescore_file+target_lang, cur_lm_bpe_code]
+        # subprocess.call(["/private/home/edunov/fastBPE/fast", "applybpe"] + bpe_src_param, shell=False)
+
+        preprocess_dir = preprocess_directory
+
+        preprocess_lm_param = [
+            "--only-source",
+            "--trainpref",
+            rescore_bpe + target_lang,
+            "--srcdict",
+            cur_lm_dict,
+            "--destdir",
+            preprocess_dir,
+        ]
+        preprocess_parser = options.get_preprocessing_parser()
+        input_args = preprocess_parser.parse_args(preprocess_lm_param)
+        preprocess.main(input_args)
+
+        eval_lm_param = [
+            preprocess_dir,
+            "--path",
+            cur_language_model,
+            "--output-word-probs",
+            "--batch-size",
+            str(batch_size),
+            "--max-tokens",
+            "1024",
+            "--sample-break-mode",
+            "eos",
+            "--gen-subset",
+            "train",
+        ]
+
+        eval_lm_parser = options.get_eval_lm_parser()
+        input_args = options.parse_args_and_arch(eval_lm_parser, eval_lm_param)
+
+        with open(lm_score_file, "w") as f:
+            with redirect_stdout(f):
+                eval_lm.main(input_args)
+
+
+def rescore_file_name(
+    nbest_dir,
+    prefix_len,
+    scorer_name,
+    lm_file=False,
+    target_prefix_frac=None,
+    source_prefix_frac=None,
+    backwards=None,
+):
+    if lm_file:
+        score_file = nbest_dir + "/lm_score_translations_model_" + scorer_name + ".txt"
+    else:
+        score_file = nbest_dir + "/" + scorer_name + "_score_translations.txt"
+    if backwards:
+        if prefix_len is not None:
+            score_file += "prefix_len" + str(prefix_len)
+        elif target_prefix_frac is not None:
+            score_file += "target_prefix_frac" + str(target_prefix_frac)
+    else:
+        if source_prefix_frac is not None:
+            score_file += "source_prefix_frac" + str(source_prefix_frac)
+    return score_file
diff --git a/fairseq-0.10.2/examples/paraphraser/README.md b/fairseq-0.10.2/examples/paraphraser/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3810311f30f99f0a07fd8e5d3723bffeba9948c3
--- /dev/null
+++ b/fairseq-0.10.2/examples/paraphraser/README.md
@@ -0,0 +1,46 @@
+# Paraphrasing with round-trip translation and mixture of experts
+
+Machine translation models can be used to paraphrase text by translating it to
+an intermediate language and back (round-trip translation).
+
+This example shows how to paraphrase text by first passing it to an
+English-French translation model, followed by a French-English [mixture of
+experts translation model](/examples/translation_moe).
+
+##### 0. Setup
+
+Clone fairseq from source and install necessary dependencies:
+```bash
+git clone https://github.com/pytorch/fairseq.git
+cd fairseq
+pip install --editable .
+pip install sacremoses sentencepiece
+```
+
+##### 1. Download models
+```bash
+wget https://dl.fbaipublicfiles.com/fairseq/models/paraphraser.en-fr.tar.gz
+wget https://dl.fbaipublicfiles.com/fairseq/models/paraphraser.fr-en.hMoEup.tar.gz
+tar -xzvf paraphraser.en-fr.tar.gz
+tar -xzvf paraphraser.fr-en.hMoEup.tar.gz
+```
+
+##### 2. Paraphrase
+```bash
+python examples/paraphraser/paraphrase.py \
+    --en2fr paraphraser.en-fr \
+    --fr2en paraphraser.fr-en.hMoEup
+# Example input:
+#   The new date for the Games, postponed for a year in response to the coronavirus pandemic, gives athletes time to recalibrate their training schedules.
+# Example outputs:
+#   Delayed one year in response to the coronavirus pandemic, the new date of the Games gives athletes time to rebalance their training schedule.
+#   The new date of the Games, which was rescheduled one year in response to the coronavirus (CV) pandemic, gives athletes time to rebalance their training schedule.
+#   The new date of the Games, postponed one year in response to the coronavirus pandemic, provides athletes with time to rebalance their training schedule.
+#   The Games' new date, postponed one year in response to the coronavirus pandemic, gives athletes time to rebalance their training schedule.
+#   The new Games date, postponed one year in response to the coronavirus pandemic, gives the athletes time to rebalance their training schedule.
+#   The new date of the Games, which was postponed one year in response to the coronavirus pandemic, gives the athletes time to rebalance their training schedule.
+#   The new date of the Games, postponed one year in response to the coronavirus pandemic, gives athletes time to rebalance their training schedule.
+#   The new date of the Games, postponed one year in response to the coronavirus pandemic, gives athletes time to re-balance their training schedule.
+#   The new date of the Games, postponed one year in response to the coronavirus pandemic, gives the athletes time to rebalance their schedule of training.
+#   The new date of the Games, postponed one year in response to the pandemic of coronavirus, gives the athletes time to rebalance their training schedule.
+```
diff --git a/fairseq-0.10.2/examples/paraphraser/paraphrase.py b/fairseq-0.10.2/examples/paraphraser/paraphrase.py
new file mode 100644
index 0000000000000000000000000000000000000000..d3422fb3db9a381b73a854d2379df214ebe544a2
--- /dev/null
+++ b/fairseq-0.10.2/examples/paraphraser/paraphrase.py
@@ -0,0 +1,85 @@
+#!/usr/bin/env python3 -u
+
+import argparse
+import fileinput
+import logging
+import os
+import sys
+
+from fairseq.models.transformer import TransformerModel
+
+
+logging.getLogger().setLevel(logging.INFO)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="")
+    parser.add_argument("--en2fr", required=True, help="path to en2fr model")
+    parser.add_argument(
+        "--fr2en", required=True, help="path to fr2en mixture of experts model"
+    )
+    parser.add_argument(
+        "--user-dir", help="path to fairseq examples/translation_moe/src directory"
+    )
+    parser.add_argument(
+        "--num-experts",
+        type=int,
+        default=10,
+        help="(keep at 10 unless using a different model)",
+    )
+    parser.add_argument(
+        "files",
+        nargs="*",
+        default=["-"],
+        help='input files to paraphrase; "-" for stdin',
+    )
+    args = parser.parse_args()
+
+    if args.user_dir is None:
+        args.user_dir = os.path.join(
+            os.path.dirname(os.path.dirname(os.path.abspath(__file__))),  # examples/
+            "translation_moe",
+            "src",
+        )
+        if os.path.exists(args.user_dir):
+            logging.info("found user_dir:" + args.user_dir)
+        else:
+            raise RuntimeError(
+                "cannot find fairseq examples/translation_moe/src "
+                "(tried looking here: {})".format(args.user_dir)
+            )
+
+    logging.info("loading en2fr model from:" + args.en2fr)
+    en2fr = TransformerModel.from_pretrained(
+        model_name_or_path=args.en2fr,
+        tokenizer="moses",
+        bpe="sentencepiece",
+    ).eval()
+
+    logging.info("loading fr2en model from:" + args.fr2en)
+    fr2en = TransformerModel.from_pretrained(
+        model_name_or_path=args.fr2en,
+        tokenizer="moses",
+        bpe="sentencepiece",
+        user_dir=args.user_dir,
+        task="translation_moe",
+    ).eval()
+
+    def gen_paraphrases(en):
+        fr = en2fr.translate(en)
+        return [
+            fr2en.translate(fr, inference_step_args={"expert": i})
+            for i in range(args.num_experts)
+        ]
+
+    logging.info("Type the input sentence and press return:")
+    for line in fileinput.input(args.files):
+        line = line.strip()
+        if len(line) == 0:
+            continue
+        for paraphrase in gen_paraphrases(line):
+            print(paraphrase)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/README.md b/fairseq-0.10.2/examples/simultaneous_translation/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e27b65280eda929c4cce3d0b1fe0a55bd94f30e9
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/README.md
@@ -0,0 +1,106 @@
+# Simultaneous Machine Translation
+
+This directory contains the code for the paper [Monotonic Multihead Attention](https://openreview.net/forum?id=Hyg96gBKPS)
+
+## Prepare Data
+
+[Please follow the instructions to download and preprocess the WMT'15 En-De dataset.](https://github.com/pytorch/fairseq/tree/simulastsharedtask/examples/translation#prepare-wmt14en2desh)
+
+## Training
+
+- MMA-IL
+
+```shell
+fairseq-train \
+    data-bin/wmt15_en_de_32k \
+    --simul-type infinite_lookback \
+    --user-dir $FAIRSEQ/example/simultaneous_translation \
+    --mass-preservation \
+    --criterion latency_augmented_label_smoothed_cross_entropy \
+    --latency-weight-avg  0.1 \
+    --max-update 50000 \
+    --arch transformer_monotonic_iwslt_de_en save_dir_key=lambda \
+    --optimizer adam --adam-betas '(0.9, 0.98)' \
+    --lr-scheduler 'inverse_sqrt' \
+    --warmup-init-lr 1e-7  --warmup-updates 4000 \
+    --lr 5e-4 --min-lr 1e-9 --clip-norm 0.0 --weight-decay 0.0001\
+    --dropout 0.3 \
+    --label-smoothing 0.1\
+    --max-tokens 3584
+```
+
+- MMA-H
+
+```shell
+fairseq-train \
+    data-bin/wmt15_en_de_32k \
+    --simul-type hard_aligned \
+    --user-dir $FAIRSEQ/example/simultaneous_translation \
+    --mass-preservation \
+    --criterion latency_augmented_label_smoothed_cross_entropy \
+    --latency-weight-var  0.1 \
+    --max-update 50000 \
+    --arch transformer_monotonic_iwslt_de_en save_dir_key=lambda \
+    --optimizer adam --adam-betas '(0.9, 0.98)' \
+    --lr-scheduler 'inverse_sqrt' \
+    --warmup-init-lr 1e-7  --warmup-updates 4000 \
+    --lr 5e-4 --min-lr 1e-9 --clip-norm 0.0 --weight-decay 0.0001\
+    --dropout 0.3 \
+    --label-smoothing 0.1\
+    --max-tokens 3584
+```
+
+- wait-k
+
+```shell
+fairseq-train \
+    data-bin/wmt15_en_de_32k \
+    --simul-type wait-k \
+    --waitk-lagging 3 \
+    --user-dir $FAIRSEQ/example/simultaneous_translation \
+    --mass-preservation \
+    --criterion latency_augmented_label_smoothed_cross_entropy \
+    --max-update 50000 \
+    --arch transformer_monotonic_iwslt_de_en save_dir_key=lambda \
+    --optimizer adam --adam-betas '(0.9, 0.98)' \
+    --lr-scheduler 'inverse_sqrt' \
+    --warmup-init-lr 1e-7  --warmup-updates 4000 \
+    --lr 5e-4 --min-lr 1e-9 --clip-norm 0.0 --weight-decay 0.0001\
+    --dropout 0.3 \
+    --label-smoothing 0.1\
+    --max-tokens 3584
+```
+
+
+## Evaluation
+
+More details on evaluation can be found [here](https://github.com/pytorch/fairseq/blob/simulastsharedtask/examples/simultaneous_translation/docs/evaluation.md)
+
+### Start the server
+
+```shell
+python ./eval/server.py \
+    --src-file $SRC_FILE \
+    --ref-file $TGT_FILE
+```
+
+### Run the client
+
+```shell
+python ./evaluate.py \
+    --data-bin data-bin/wmt15_en_de_32k \
+    --model-path ./checkpoints/checkpoint_best.pt
+    --scores --output $RESULT_DIR
+```
+
+### Run evaluation locally without server
+
+```shell
+python ./eval/evaluate.py
+    --local \
+    --src-file $SRC_FILE \
+    --tgt-file $TGT_FILE \
+    --data-bin data-bin/wmt15_en_de_32k \
+    --model-path ./checkpoints/checkpoint_best.pt \
+    --scores --output $RESULT_DIR
+```
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/__init__.py b/fairseq-0.10.2/examples/simultaneous_translation/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..446fc86c8ad10a271721a6c824d513600e5b015c
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/__init__.py
@@ -0,0 +1,6 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from . import criterions, eval, models  # noqa
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/docs/baseline.md b/fairseq-0.10.2/examples/simultaneous_translation/docs/baseline.md
new file mode 100644
index 0000000000000000000000000000000000000000..d9bf1a1117ec44adfcffc3f2b7daf732ca7eedc6
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/docs/baseline.md
@@ -0,0 +1,178 @@
+# **Baseline Simultaneous Translation**
+---
+
+This is an instruction of training and evaluating a *wait-k* simultanoes LSTM model on MUST-C English-Gernam Dataset.
+
+[STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework](https://https://www.aclweb.org/anthology/P19-1289/)
+
+
+## **Requirements**
+Install fairseq (make sure to use the correct branch):
+```
+git clone --branch simulastsharedtask git@github.com:pytorch/fairseq.git
+cd fairseq
+pip install -e .
+```
+
+Assuming that fairseq is installed in a directory called `FAIRSEQ`.
+
+Install SentencePiece. One easy way is to use anaconda:
+
+```
+conda install -c powerai sentencepiece
+```
+
+Download the MuST-C data for English-German available at https://ict.fbk.eu/must-c/.
+We will assume that the data is downloaded in a directory called `DATA_ROOT`.
+
+
+## **Text-to-text Model**
+---
+### Data Preparation
+Train a SentencePiece model:
+```shell
+for lang in en de; do
+    python $FAIRSEQ/examples/simultaneous_translation/data/train_spm.py \
+        --data-path $DATA_ROOT/data \
+        --vocab-size 10000 \
+        --max-frame 3000 \
+        --model-type unigram \
+        --lang $lang \
+        --out-path .
+```
+
+Process the data with the SentencePiece model:
+```shell
+proc_dir=proc
+mkdir -p $proc_dir
+for split in train dev tst-COMMON tst-HE; do
+    for lang in en de; do
+        spm_encode \
+            --model unigram-$lang-10000-3000/spm.model \
+            < $DATA_ROOT/data/$split/txt/$split.$lang \
+            > $proc_dir/$split.spm.$lang
+    done
+done
+```
+
+Binarize the data:
+
+```shell
+proc_dir=proc
+fairseq-preprocess \
+    --source-lang en --target-lang de \
+    --trainpref $proc_dir/train.spm \
+    --validpref $proc_dir/dev.spm \
+    --testpref $proc_dir/tst-COMMON.spm \
+    --thresholdtgt 0 \
+    --thresholdsrc 0 \
+    --workers 20 \
+    --destdir ./data-bin/mustc_en_de \
+```
+
+### Training
+
+
+```shell
+mkdir -p checkpoints
+CUDA_VISIBLE_DEVICES=1 python $FAIRSEQ/train.py data-bin/mustc_en_de \
+    --save-dir checkpoints \
+    --arch berard_simul_text_iwslt \
+    --simul-type waitk \
+    --waitk-lagging 2 \
+    --optimizer adam \
+    --max-epoch 100 \
+    --lr 0.001 \
+    --clip-norm 5.0  \
+    --batch-size 128  \
+    --log-format json \
+    --log-interval 10 \
+    --criterion cross_entropy_acc \
+    --user-dir $FAIRSEQ/examples/simultaneous_translation
+```
+
+## **Speech-to-text Model**
+---
+### Data Preparation
+First, segment wav files.
+```shell 
+python $FAIRSEQ/examples/simultaneous_translation/data/segment_wav.py \
+    --datapath $DATA_ROOT
+```
+Similar to text-to-text model, train a Sentencepiecemodel, but only train on German
+```Shell
+python $FAIRSEQ/examples/simultaneous_translation/data/train_spm.py \
+    --data-path $DATA_ROOT/data \
+    --vocab-size 10000 \
+    --max-frame 3000 \
+    --model-type unigram \
+    --lang $lang \
+    --out-path .
+```
+## Training
+```shell
+mkdir -p checkpoints
+CUDA_VISIBLE_DEVICES=1 python $FAIRSEQ/train.py data-bin/mustc_en_de \
+    --save-dir checkpoints \
+    --arch berard_simul_text_iwslt \
+    --waitk-lagging 2 \
+    --waitk-stride 10 \
+    --input-feat-per-channel 40 \
+    --encoder-hidden-size 512 \
+    --output-layer-dim 128 \
+    --decoder-num-layers 3 \
+    --task speech_translation \
+    --user-dir $FAIRSEQ/examples/simultaneous_translation
+    --optimizer adam \
+    --max-epoch 100 \
+    --lr 0.001 \
+    --clip-norm 5.0  \
+    --batch-size 128  \
+    --log-format json \
+    --log-interval 10 \
+    --criterion cross_entropy_acc \
+    --user-dir $FAIRSEQ/examples/simultaneous_translation
+```
+
+## Evaluation
+---
+### Evaluation Server
+For text translation models, the server is set up as follow give input file and reference file. 
+
+``` shell
+python ./eval/server.py \
+    --hostname localhost \
+    --port 12321 \
+    --src-file $DATA_ROOT/data/dev/txt/dev.en \
+    --ref-file $DATA_ROOT/data/dev/txt/dev.de
+```
+For speech translation models, the input is the data direcrory.
+``` shell
+python ./eval/server.py \
+    --hostname localhost \
+    --port 12321 \
+    --ref-file $DATA_ROOT \
+    --data-type speech
+```
+
+### Decode and Evaluate with Client
+Once the server is set up, run client to evaluate translation quality and latency.
+```shell
+# TEXT
+python $fairseq_dir/examples/simultaneous_translation/evaluate.py \
+    data-bin/mustc_en_de \
+    --user-dir $FAIRSEQ/examples/simultaneous_translation \
+    --src-spm unigram-en-10000-3000/spm.model\
+    --tgt-spm unigram-de-10000-3000/spm.model\
+    -s en -t de \
+    --path checkpoints/checkpoint_best.pt
+
+# SPEECH
+python $fairseq_dir/examples/simultaneous_translation/evaluate.py \
+    data-bin/mustc_en_de \
+    --user-dir $FAIRSEQ/examples/simultaneous_translation \
+    --data-type speech \
+    --tgt-spm unigram-de-10000-3000/spm.model\
+    -s en -t de \
+    --path checkpoints/checkpoint_best.pt
+```
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/docs/evaluation.md b/fairseq-0.10.2/examples/simultaneous_translation/docs/evaluation.md
new file mode 100644
index 0000000000000000000000000000000000000000..c53407354e6f52ddd34341185976db406ce87f95
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/docs/evaluation.md
@@ -0,0 +1,115 @@
+# Introduction to evaluation interface
+The simultaneous translation models from sharedtask participents are evaluated under a server-client protocol. The participents are requisted to plug in their own model API in the protocol, and submit a docker file.
+
+## Server-Client Protocol
+An server-client protocol that will be used in evaluation. For example, when a *wait-k* model (k=3) translate the English sentence "Alice and Bob are good friends" to Genman sentence "Alice und Bob sind gute Freunde." , the evaluation process is shown as following figure. 
+
+While every time client needs to read a new state (word or speech utterence), a "GET" request is supposed to sent over to server. Whenever a new token is generated, a "SEND" request with the word predicted (untokenized word) will be sent to server immediately. The server can hence calculate both latency and BLEU score of the sentence.
+
+### Server
+The server code is provided and can be set up directly locally for development purpose. For example, to evaluate a text simultaneous test set,
+
+```shell
+
+  python fairseq/examples/simultaneous_translation/eval/server.py \
+    --hostname local_host  \
+    --port 1234 \
+    --src-file SRC_FILE \  
+    --ref-file REF_FILE  \  
+    --data-type text \
+```
+The state that server sent to client is has the following format
+```json
+{
+  'sent_id': Int,
+  'segment_id': Int,
+  'segment': String
+}
+```
+
+### Client
+The client will handle the evaluation process mentioned above. It should be out-of-box as well. The client's protocol is as following table
+
+|Action|Content|
+|:---:|:---:|
+|Request new word / utterence| ```{key: "Get", value: None}```|
+|Predict word "W"| ```{key: "SEND", value: "W"}```|
+
+
+
+The core of the client module is the agent, which needs to be modified to different models accordingly. The abstract class of agent is as follow, the evaluation process happens in the `decode()` function. 
+```python
+class Agent(object):
+    "an agent needs to follow this pattern"
+    def __init__(self, *args, **kwargs):
+        ...
+
+    def init_states(self):
+        # Initializing states
+        ...
+
+    def update_states(self, states, new_state):
+        # Update states with given new state from server
+        # TODO (describe the states)
+        ...
+
+    def finish_eval(self, states, new_state):
+        # Check if evaluation is finished
+        ...
+    
+    def policy(self, state: list) -> dict:
+        # Provide a action given current states
+        # The action can only be either
+        # {key: "GET", value: NONE} 
+        # or
+        # {key: "SEND", value: W}
+        ...
+
+    def reset(self):
+        # Reset agent
+        ...
+        
+    def decode(self, session):
+        
+        states = self.init_states()
+        self.reset()      
+
+        # Evaluataion protocol happens here
+        while True:
+            # Get action from the current states according to self.policy()
+            action = self.policy(states)
+
+            if action['key'] == GET:
+                # Read a new state from server
+                new_state = session.get_src()
+                states = self.update_states(states, new_state)
+
+                if self.finish_eval(states, new_state):
+                    # End of document
+                    break 
+                
+            elif action['key'] == SEND:
+                # Send a new prediction to server
+                session.send_hypo(action['value'])
+                
+                # Clean the history, wait for next sentence
+                if action['value'] == DEFAULT_EOS:
+                    states = self.init_states() 
+                    self.reset()
+            else:
+                raise NotImplementedError
+
+ 
+```
+Here an implementation of agent of text [*wait-k* model](somelink). Notice that the tokenization is not considered.
+
+## Quality
+The quality is measured by detokenized BLEU. So make sure that the predicted words sent to server are detokenized. An implementation is can be find [here](some link)
+
+## Latency
+The latency metrics are 
+* Average Proportion
+* Average Lagging
+* Differentiable Average Lagging
+Again Thery will also be evaluated on detokenized text.
+
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/eval/__init__.py b/fairseq-0.10.2/examples/simultaneous_translation/eval/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6264236915a7269a4d920ee8213004374dd86a9a
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/eval/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/eval/agents/word_splitter.py b/fairseq-0.10.2/examples/simultaneous_translation/eval/agents/word_splitter.py
new file mode 100644
index 0000000000000000000000000000000000000000..c3f71200a5afaa66b9b642ebf99a2722f6437f2b
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/eval/agents/word_splitter.py
@@ -0,0 +1,91 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+
+class SubwordSplitter(object):
+    def process_line(self, string):
+        raise NotImplementedError
+
+    def split(self, string):
+        raise NotImplementedError
+
+
+class NoneWordSplitter(object):
+    def __init__(self, model):
+        pass
+
+    def split(self, string):
+        return [string]
+
+    def process_line(self, string):
+        return [string]
+
+    def finished_word(self, string):
+        return True
+
+    def merge(self, list_of_string):
+        return "".join(list_of_string)
+
+    def last_full_word_step(self, tokens, step):
+        return len(tokens)
+
+    def end_idx_last_full_word(self, tokens):
+        return len(tokens)
+
+
+class BPEWordSplitter(object):
+    # TODO: lock back here
+    def __init__(self, model_path):
+        super().__init__()
+        from subword_nmt.apply_bpe import BPE
+
+        with open(model_path) as f:
+            self.model = BPE(f)
+
+    def split(self, string):
+        return self.model.process_line(string).split()
+
+    def end_idx_last_full_word(self, tokens):
+        # Begin of word indices
+        bow_indices = [0] + [i + 1 for i, t in enumerate(tokens[1:]) if t[-2:] != "@@"]
+
+        if len(bow_indices) < 2:
+            return 0
+        else:
+            return bow_indices[-1]
+
+    def merge(self, list_of_string):
+        return " ".join([item.replace("@@", "") for item in list_of_string])
+
+
+class SentencePieceModelWordSplitter(object):
+    def __init__(self, model_path):
+        super().__init__()
+        import sentencepiece as spm
+
+        self.model = spm.SentencePieceProcessor()
+        self.model.Load(model_path)
+
+    def split(self, string):
+        return self.model.EncodeAsPieces(string)
+
+    def end_idx_last_full_word(self, tokens):
+        # Begin of word indices
+        bow_indices = [i for i, t in enumerate(tokens) if t[0] == "\u2581"]
+
+        if len(bow_indices) < 2:
+            return 0
+        else:
+            return bow_indices[-1]
+
+    def merge(self, list_of_string):
+        return self.model.DecodePieces(list_of_string)
+
+
+SPLITTER_DICT = {
+    None: NoneWordSplitter,
+    "BPE": BPEWordSplitter,
+    "SentencePieceModel": SentencePieceModelWordSplitter,
+}
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/eval/client.py b/fairseq-0.10.2/examples/simultaneous_translation/eval/client.py
new file mode 100644
index 0000000000000000000000000000000000000000..3ca4ea73b8cc58eeb5ca90f348ed9097b6d7332f
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/eval/client.py
@@ -0,0 +1,100 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Optional
+
+import requests
+from scorers import build_scorer
+
+
+class SimulSTEvaluationService(object):
+    DEFAULT_HOSTNAME = "localhost"
+    DEFAULT_PORT = 12321
+
+    def __init__(self, hostname=DEFAULT_HOSTNAME, port=DEFAULT_PORT):
+        self.hostname = hostname
+        self.port = port
+        self.base_url = f"http://{self.hostname}:{self.port}"
+
+    def __enter__(self):
+        self.new_session()
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        pass
+
+    def new_session(self):
+        # start eval session
+        url = f"{self.base_url}"
+
+        try:
+            _ = requests.post(url)
+        except Exception as e:
+            print(f"Failed to start an evaluation session: {e}")
+
+        print("Evaluation session started.")
+        return self
+
+    def get_scores(self):
+        # end eval session
+        url = f"{self.base_url}/result"
+        try:
+            r = requests.get(url)
+            print("Scores: {}".format(r.json()))
+            print("Evaluation session finished.")
+        except Exception as e:
+            print(f"Failed to end an evaluation session: {e}")
+
+    def get_src(self, sent_id: int, extra_params: Optional[dict] = None) -> str:
+        url = f"{self.base_url}/src"
+        params = {"sent_id": sent_id}
+        if extra_params is not None:
+            for key in extra_params.keys():
+                params[key] = extra_params[key]
+        try:
+            r = requests.get(url, params=params)
+        except Exception as e:
+            print(f"Failed to request a source segment: {e}")
+        return r.json()
+
+    def send_hypo(self, sent_id: int, hypo: str) -> None:
+        url = f"{self.base_url}/hypo"
+        params = {"sent_id": sent_id}
+
+        try:
+            requests.put(url, params=params, data=hypo.encode("utf-8"))
+        except Exception as e:
+            print(f"Failed to send a translated segment: {e}")
+
+    def corpus_info(self):
+        url = f"{self.base_url}"
+        try:
+            r = requests.get(url)
+        except Exception as e:
+            print(f"Failed to request corpus information: {e}")
+
+        return r.json()
+
+
+class SimulSTLocalEvaluationService(object):
+    def __init__(self, args):
+        self.scorer = build_scorer(args)
+
+    def get_scores(self):
+        return self.scorer.score()
+
+    def get_src(self, sent_id: int, extra_params: Optional[dict] = None) -> str:
+        if extra_params is not None:
+            segment_size = extra_params.get("segment_size", None)
+        else:
+            segment_size = None
+
+        return self.scorer.send_src(int(sent_id), segment_size)
+
+    def send_hypo(self, sent_id: int, hypo: str) -> None:
+        list_of_tokens = hypo.strip().split()
+        self.scorer.recv_hyp(sent_id, list_of_tokens)
+
+    def corpus_info(self):
+        return self.scorer.get_info()
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/eval/eval_latency.py b/fairseq-0.10.2/examples/simultaneous_translation/eval/eval_latency.py
new file mode 100644
index 0000000000000000000000000000000000000000..50021de47c7b3907e05088fa9040734423b91483
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/eval/eval_latency.py
@@ -0,0 +1,78 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import json
+
+import torch
+from examples.simultaneous_translation.utils.latency import LatencyInference
+
+
+LATENCY_METRICS = [
+    "differentiable_average_lagging",
+    "average_lagging",
+    "average_proportion",
+]
+
+
+class LatencyScorer:
+    def __init__(self, start_from_zero=True):
+        self.recorder = []
+        self.scores = {}
+        self.scorer = LatencyInference()
+        self.start_from_zero = start_from_zero
+
+    def update_reorder(self, list_of_dict):
+        self.recorder = []
+        for info in list_of_dict:
+            delays = [int(x) - int(not self.start_from_zero) for x in info["delays"]]
+            delays = torch.LongTensor(delays).unsqueeze(0)
+            src_len = torch.LongTensor([info["src_len"]]).unsqueeze(0)
+
+            self.recorder.append(self.scorer(delays, src_len))
+
+    def cal_latency(self):
+        self.scores = {}
+        for metric in LATENCY_METRICS:
+            self.scores[metric] = sum(
+                [x[metric][0, 0].item() for x in self.recorder]
+            ) / len(self.recorder)
+        return self.scores
+
+    @classmethod
+    def score(cls, list_of_dict, start_from_zero=True):
+        scorer_to_return = cls(start_from_zero)
+        scorer_to_return.update_reorder(list_of_dict)
+        scorer_to_return.cal_latency()
+        return scorer_to_return.scores
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input", required=True)
+    parser.add_argument("--start-from-zero", action="store_true")
+    args = parser.parse_args()
+
+    scorer = LatencyInference()
+    recorder = []
+    with open(args.input, "r") as f:
+        for line in f:
+            info = json.loads(line)
+
+            delays = [int(x) - int(not args.start_from_zero) for x in info["delays"]]
+
+            delays = torch.LongTensor(delays).unsqueeze(0)
+
+            src_len = torch.LongTensor([info["src_len"]]).unsqueeze(0)
+
+            recorder.append(scorer(delays, src_len))
+
+    average_results = {}
+
+    for metric in LATENCY_METRICS:
+        average_results[metric] = sum([x[metric][0, 0].item() for x in recorder]) / len(
+            recorder
+        )
+        print(f"{metric}: {average_results[metric]}")
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/eval/evaluate.py b/fairseq-0.10.2/examples/simultaneous_translation/eval/evaluate.py
new file mode 100644
index 0000000000000000000000000000000000000000..2f7474621aa648b08581cfe66dda501b7c1e9621
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/eval/evaluate.py
@@ -0,0 +1,81 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+
+from agents import build_agent
+from client import SimulSTEvaluationService, SimulSTLocalEvaluationService
+from fairseq.registry import REGISTRIES
+
+
+DEFAULT_HOSTNAME = "localhost"
+DEFAULT_PORT = 12321
+
+
+def get_args():
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument(
+        "--hostname", type=str, default=DEFAULT_HOSTNAME, help="server hostname"
+    )
+    parser.add_argument(
+        "--port", type=int, default=DEFAULT_PORT, help="server port number"
+    )
+    parser.add_argument("--agent-type", default="simul_trans_text", help="Agent type")
+    parser.add_argument("--scorer-type", default="text", help="Scorer type")
+    parser.add_argument(
+        "--start-idx",
+        type=int,
+        default=0,
+        help="Start index of the sentence to evaluate",
+    )
+    parser.add_argument(
+        "--end-idx",
+        type=int,
+        default=float("inf"),
+        help="End index of the sentence to evaluate",
+    )
+    parser.add_argument(
+        "--scores", action="store_true", help="Request scores from server"
+    )
+    parser.add_argument("--reset-server", action="store_true", help="Reset the server")
+    parser.add_argument(
+        "--num-threads", type=int, default=10, help="Number of threads used by agent"
+    )
+    parser.add_argument(
+        "--local", action="store_true", default=False, help="Local evaluation"
+    )
+
+    args, _ = parser.parse_known_args()
+
+    for registry_name, REGISTRY in REGISTRIES.items():
+        choice = getattr(args, registry_name, None)
+        if choice is not None:
+            cls = REGISTRY["registry"][choice]
+            if hasattr(cls, "add_args"):
+                cls.add_args(parser)
+    args = parser.parse_args()
+
+    return args
+
+
+if __name__ == "__main__":
+    args = get_args()
+
+    if args.local:
+        session = SimulSTLocalEvaluationService(args)
+    else:
+        session = SimulSTEvaluationService(args.hostname, args.port)
+
+    if args.reset_server:
+        session.new_session()
+
+    if args.agent_type is not None:
+        agent = build_agent(args)
+        agent.decode(session, args.start_idx, args.end_idx, args.num_threads)
+
+    if args.scores:
+        session.get_scores()
+    print(session.get_scores())
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/eval/server.py b/fairseq-0.10.2/examples/simultaneous_translation/eval/server.py
new file mode 100644
index 0000000000000000000000000000000000000000..e44ceaff85d899de3f5aedddd774e1e4c3fcd98b
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/eval/server.py
@@ -0,0 +1,89 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+import argparse
+import json
+import sys
+
+from scorers import build_scorer
+from tornado import ioloop, web
+
+
+DEFAULT_HOSTNAME = "localhost"
+DEFAULT_PORT = 12321
+
+
+class ScorerHandler(web.RequestHandler):
+    def initialize(self, scorer):
+        self.scorer = scorer
+
+
+class EvalSessionHandler(ScorerHandler):
+    def post(self):
+        self.scorer.reset()
+
+    def get(self):
+        r = json.dumps(self.scorer.get_info())
+        self.write(r)
+
+
+class ResultHandler(ScorerHandler):
+    def get(self):
+        r = json.dumps(self.scorer.score())
+        self.write(r)
+
+
+class SourceHandler(ScorerHandler):
+    def get(self):
+        sent_id = int(self.get_argument("sent_id"))
+        segment_size = None
+        if "segment_size" in self.request.arguments:
+            string = self.get_argument("segment_size")
+            if len(string) > 0:
+                segment_size = int(string)
+
+        r = json.dumps(self.scorer.send_src(int(sent_id), segment_size))
+
+        self.write(r)
+
+
+class HypothesisHandler(ScorerHandler):
+    def put(self):
+        sent_id = int(self.get_argument("sent_id"))
+        list_of_tokens = self.request.body.decode("utf-8").strip().split()
+        self.scorer.recv_hyp(sent_id, list_of_tokens)
+
+
+def add_args():
+    parser = argparse.ArgumentParser()
+    # fmt: off
+    parser.add_argument('--hostname', type=str, default=DEFAULT_HOSTNAME,
+                        help='Server hostname')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT,
+                        help='Server port number')
+
+    args, _ = parser.parse_known_args()
+    # fmt: on
+    return args
+
+
+def start_server(scorer, hostname=DEFAULT_HOSTNAME, port=DEFAULT_PORT, debug=False):
+    app = web.Application(
+        [
+            (r"/result", ResultHandler, dict(scorer=scorer)),
+            (r"/src", SourceHandler, dict(scorer=scorer)),
+            (r"/hypo", HypothesisHandler, dict(scorer=scorer)),
+            (r"/", EvalSessionHandler, dict(scorer=scorer)),
+        ],
+        debug=debug,
+    )
+    app.listen(port, max_buffer_size=1024 ** 3)
+    sys.stdout.write(f"Evaluation Server Started. Listening to port {port}\n")
+    ioloop.IOLoop.current().start()
+
+
+if __name__ == "__main__":
+    args = add_args()
+    scorer = build_scorer(args)
+    start_server(scorer, args.hostname, args.port, args.debug)
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/models/__init__.py b/fairseq-0.10.2/examples/simultaneous_translation/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..083da4373221faff11473c2821f2bb37928793d3
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/models/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        model_name = file[: file.find(".py")]
+        importlib.import_module(
+            "examples.simultaneous_translation.models." + model_name
+        )
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/models/transformer_monotonic_attention.py b/fairseq-0.10.2/examples/simultaneous_translation/models/transformer_monotonic_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..ab8adf3aab4d86cc568230476ee403b4c1a462bf
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/models/transformer_monotonic_attention.py
@@ -0,0 +1,322 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from examples.simultaneous_translation.modules.monotonic_transformer_layer import (
+    TransformerMonotonicDecoderLayer,
+    TransformerMonotonicEncoderLayer,
+)
+from fairseq.models import register_model, register_model_architecture
+from fairseq.models.transformer import (
+    TransformerDecoder,
+    TransformerEncoder,
+    TransformerModel,
+    base_architecture,
+    transformer_iwslt_de_en,
+    transformer_vaswani_wmt_en_de_big,
+)
+
+
+DEFAULT_MAX_SOURCE_POSITIONS = 1024
+DEFAULT_MAX_TARGET_POSITIONS = 1024
+
+
+@register_model("transformer_unidirectional")
+class TransformerUnidirectionalModel(TransformerModel):
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return TransformerMonotonicEncoder(args, src_dict, embed_tokens)
+
+
+@register_model("transformer_monotonic")
+class TransformerMonotonicModel(TransformerModel):
+    @classmethod
+    def build_encoder(cls, args, src_dict, embed_tokens):
+        return TransformerMonotonicEncoder(args, src_dict, embed_tokens)
+
+    @classmethod
+    def build_decoder(cls, args, tgt_dict, embed_tokens):
+        return TransformerMonotonicDecoder(args, tgt_dict, embed_tokens)
+
+    def _indices_from_states(self, states):
+        if type(states["indices"]["src"]) == list:
+            if next(self.parameters()).is_cuda:
+                tensor = torch.cuda.LongTensor
+            else:
+                tensor = torch.LongTensor
+
+            src_indices = tensor(
+                [states["indices"]["src"][: 1 + states["steps"]["src"]]]
+            )
+
+            tgt_indices = tensor(
+                [[self.decoder.dictionary.eos()] + states["indices"]["tgt"]]
+            )
+        else:
+            src_indices = states["indices"]["src"][: 1 + states["steps"]["src"]]
+            tgt_indices = states["indices"]["tgt"]
+
+        return src_indices, None, tgt_indices
+
+    def predict_from_states(self, states):
+        decoder_states = self.decoder.output_layer(states["decoder_features"])
+        lprobs = self.get_normalized_probs([decoder_states[:, -1:]], log_probs=True)
+
+        index = lprobs.argmax(dim=-1)
+
+        token = self.decoder.dictionary.string(index)
+
+        return token, index[0, 0].item()
+
+    def decision_from_states(self, states):
+        """
+        This funcion take states dictionary as input, and gives the agent
+        a decision of whether read a token from server. Moreover, the decoder
+        states are also calculated here so we can directly generate a target
+        token without recompute every thing
+        """
+
+        self.eval()
+
+        if len(states["tokens"]["src"]) == 0:
+            return 0
+
+        src_indices, src_lengths, tgt_indices = self._indices_from_states(states)
+
+        # Update encoder states if needed
+        if (
+            "encoder_states" not in states
+            or states["encoder_states"][0].size(1) <= states["steps"]["src"]
+        ):
+            encoder_out_dict = self.encoder(src_indices, src_lengths)
+            states["encoder_states"] = encoder_out_dict
+        else:
+            encoder_out_dict = states["encoder_states"]
+
+        # online means we still need tokens to feed the model
+        states["model_states"]["online"] = not (
+            states["finish_read"]
+            and len(states["tokens"]["src"]) == states["steps"]["src"]
+        )
+
+        states["model_states"]["steps"] = states["steps"]
+
+        x, outputs = self.decoder.forward(
+            prev_output_tokens=tgt_indices,
+            encoder_out=encoder_out_dict,
+            incremental_state=states["model_states"],
+            features_only=True,
+        )
+
+        states["decoder_features"] = x
+
+        return outputs["action"]
+
+
+class TransformerMonotonicEncoder(TransformerEncoder):
+    def __init__(self, args, dictionary, embed_tokens):
+        super().__init__(args, dictionary, embed_tokens)
+
+        self.dictionary = dictionary
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [TransformerMonotonicEncoderLayer(args) for i in range(args.encoder_layers)]
+        )
+
+
+class TransformerMonotonicDecoder(TransformerDecoder):
+    """
+    Transformer decoder consisting of *args.decoder_layers* layers. Each layer
+    is a :class:`TransformerDecoderLayer`.
+
+    Args:
+        args (argparse.Namespace): parsed command-line arguments
+        dictionary (~fairseq.data.Dictionary): decoding dictionary
+        embed_tokens (torch.nn.Embedding): output embedding
+        no_encoder_attn (bool, optional): whether to attend to encoder outputs
+            (default: False).
+    """
+
+    def __init__(self, args, dictionary, embed_tokens, no_encoder_attn=False):
+        super().__init__(args, dictionary, embed_tokens, no_encoder_attn=False)
+
+        self.dictionary = dictionary
+        self.layers = nn.ModuleList([])
+        self.layers.extend(
+            [
+                TransformerMonotonicDecoderLayer(args, no_encoder_attn)
+                for _ in range(args.decoder_layers)
+            ]
+        )
+
+    def pre_attention(
+        self, prev_output_tokens, encoder_out_dict, incremental_state=None
+    ):
+        positions = (
+            self.embed_positions(
+                prev_output_tokens,
+                incremental_state=incremental_state,
+            )
+            if self.embed_positions is not None
+            else None
+        )
+
+        if incremental_state is not None:
+            prev_output_tokens = prev_output_tokens[:, -1:]
+            if positions is not None:
+                positions = positions[:, -1:]
+
+        # embed tokens and positions
+        x = self.embed_scale * self.embed_tokens(prev_output_tokens)
+
+        if self.project_in_dim is not None:
+            x = self.project_in_dim(x)
+
+        if positions is not None:
+            x += positions
+        x = self.dropout_module(x)
+
+        # B x T x C -> T x B x C
+        x = x.transpose(0, 1)
+
+        encoder_out = encoder_out_dict.encoder_out
+        encoder_padding_mask = encoder_out_dict.encoder_padding_mask
+
+        return x, encoder_out, encoder_padding_mask
+
+    def post_attention(self, x):
+        if self.layer_norm:
+            x = self.layer_norm(x)
+
+        # T x B x C -> B x T x C
+        x = x.transpose(0, 1)
+
+        if self.project_out_dim is not None:
+            x = self.project_out_dim(x)
+
+        return x
+
+    def extract_features(
+        self, prev_output_tokens, encoder_out, incremental_state=None, **unused
+    ):
+        """
+        Similar to *forward* but only return features.
+
+        Returns:
+            tuple:
+                - the decoder's features of shape `(batch, tgt_len, embed_dim)`
+                - a dictionary with any model-specific outputs
+        """
+        # incremental_state = None
+        (x, encoder_outs, encoder_padding_mask) = self.pre_attention(
+            prev_output_tokens, encoder_out, incremental_state
+        )
+        attn = None
+        inner_states = [x]
+        attn_list = []
+        step_list = []
+
+        for i, layer in enumerate(self.layers):
+
+            x, attn, _ = layer(
+                x=x,
+                encoder_out=encoder_outs,
+                encoder_padding_mask=encoder_padding_mask,
+                incremental_state=incremental_state,
+                self_attn_mask=self.buffered_future_mask(x)
+                if incremental_state is None
+                else None,
+            )
+
+            inner_states.append(x)
+            attn_list.append(attn)
+
+            if incremental_state is not None:
+                curr_steps = layer.get_steps(incremental_state)
+                step_list.append(curr_steps)
+
+                if incremental_state.get("online", False):
+                    p_choose = (
+                        attn["p_choose"].squeeze(0).squeeze(1).gather(1, curr_steps.t())
+                    )
+
+                    new_steps = curr_steps + (p_choose < 0.5).t().type_as(curr_steps)
+
+                    if (new_steps >= incremental_state["steps"]["src"]).any():
+                        # We need to prune the last self_attn saved_state
+                        # if model decide not to read
+                        # otherwise there will be duplicated saved_state
+                        for j in range(i + 1):
+                            self.layers[j].prune_incremental_state(incremental_state)
+
+                        return x, {"action": 0}
+
+        if incremental_state is not None and not incremental_state.get("online", False):
+            # Here is for fast evaluation
+            fastest_step = (
+                torch.max(torch.cat(step_list, dim=1), dim=1, keepdim=True)[0] + 1
+            )
+
+            if "fastest_step" in incremental_state:
+                incremental_state["fastest_step"] = torch.cat(
+                    [incremental_state["fastest_step"], fastest_step], dim=1
+                )
+            else:
+                incremental_state["fastest_step"] = fastest_step
+
+        x = self.post_attention(x)
+
+        return x, {
+            "action": 1,
+            "attn_list": attn_list,
+            "step_list": step_list,
+            "encoder_out": encoder_out,
+            "encoder_padding_mask": encoder_padding_mask,
+        }
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        super().reorder_incremental_state(incremental_state, new_order)
+        if "fastest_step" in incremental_state:
+            incremental_state["fastest_step"] = incremental_state[
+                "fastest_step"
+            ].index_select(0, new_order)
+
+
+@register_model_architecture("transformer_monotonic", "transformer_monotonic")
+def base_monotonic_rchitecture(args):
+    base_architecture(args)
+    args.encoder_unidirectional = getattr(args, "encoder_unidirectional", False)
+
+
+@register_model_architecture(
+    "transformer_monotonic", "transformer_monotonic_iwslt_de_en"
+)
+def transformer_monotonic_iwslt_de_en(args):
+    transformer_iwslt_de_en(args)
+    base_monotonic_rchitecture(args)
+
+
+# parameters used in the "Attention Is All You Need" paper (Vaswani et al., 2017)
+@register_model_architecture(
+    "transformer_monotonic", "transformer_monotonic_vaswani_wmt_en_de_big"
+)
+def transformer_monotonic_vaswani_wmt_en_de_big(args):
+    transformer_vaswani_wmt_en_de_big(args)
+
+
+@register_model_architecture(
+    "transformer_monotonic", "transformer_monotonic_vaswani_wmt_en_fr_big"
+)
+def transformer_monotonic_vaswani_wmt_en_fr_big(args):
+    transformer_monotonic_vaswani_wmt_en_fr_big(args)
+
+
+@register_model_architecture(
+    "transformer_unidirectional", "transformer_unidirectional_iwslt_de_en"
+)
+def transformer_unidirectional_iwslt_de_en(args):
+    transformer_iwslt_de_en(args)
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/modules/__init__.py b/fairseq-0.10.2/examples/simultaneous_translation/modules/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..ad64774de46d5def141f9a452042a445046dc78e
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/modules/__init__.py
@@ -0,0 +1,24 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+from fairseq import registry
+
+
+(
+    build_monotonic_attention,
+    register_monotonic_attention,
+    MONOTONIC_ATTENTION_REGISTRY,
+    _,
+) = registry.setup_registry("--simul-type")
+
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        model_name = file[: file.find(".py")]
+        importlib.import_module(
+            "examples.simultaneous_translation.modules." + model_name
+        )
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/modules/monotonic_multihead_attention.py b/fairseq-0.10.2/examples/simultaneous_translation/modules/monotonic_multihead_attention.py
new file mode 100644
index 0000000000000000000000000000000000000000..c09725ac9a791d720769037697865c890953e686
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/modules/monotonic_multihead_attention.py
@@ -0,0 +1,622 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import math
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from examples.simultaneous_translation.utils.functions import (
+    exclusive_cumprod,
+    lengths_to_mask,
+)
+from fairseq import utils
+from fairseq.incremental_decoding_utils import with_incremental_state
+from fairseq.modules import MultiheadAttention
+from fairseq.utils import convert_padding_direction
+
+from . import register_monotonic_attention
+
+
+@with_incremental_state
+class MonotonicAttention(nn.Module):
+    """
+    Abstract class of monotonic attentions
+    """
+
+    def __init__(self, args):
+        self.eps = args.attention_eps
+        self.mass_preservation = args.mass_preservation
+
+        self.noise_mean = args.noise_mean
+        self.noise_var = args.noise_var
+
+        self.energy_bias_init = args.energy_bias_init
+        self.energy_bias = (
+            nn.Parameter(self.energy_bias_init * torch.ones([1]))
+            if args.energy_bias is True
+            else 0
+        )
+
+    @staticmethod
+    def add_args(parser):
+        # fmt: off
+        parser.add_argument('--no-mass-preservation', action="store_false", dest="mass_preservation",
+                            help='Do not stay on the last token when decoding')
+        parser.add_argument('--mass-preservation', action="store_true", dest="mass_preservation",
+                            help='Stay on the last token when decoding')
+        parser.set_defaults(mass_preservation=True)
+
+        parser.add_argument('--noise-var', type=float, default=1.0,
+                            help='Variance of discretness noise')
+        parser.add_argument('--noise-mean', type=float, default=0.0,
+                            help='Mean of discretness noise')
+        parser.add_argument('--energy-bias', action="store_true", default=False,
+                            help='Bias for energy')
+        parser.add_argument('--energy-bias-init', type=float, default=-2.0,
+                            help='Initial value of the bias for energy')
+        parser.add_argument('--attention-eps', type=float, default=1e-6,
+                            help='Epsilon when calculating expected attention')
+        # fmt: on
+
+    def p_choose(self, *args):
+        raise NotImplementedError
+
+    def input_projections(self, *args):
+        raise NotImplementedError
+
+    def attn_energy(self, q_proj, k_proj, key_padding_mask=None):
+        """
+        Calculating monotonic energies
+
+        ============================================================
+        Expected input size
+        q_proj: bsz * num_heads, tgt_len, self.head_dim
+        k_proj: bsz * num_heads, src_len, self.head_dim
+        key_padding_mask: bsz, src_len
+        attn_mask: tgt_len, src_len
+        """
+        bsz, tgt_len, embed_dim = q_proj.size()
+        bsz = bsz // self.num_heads
+        src_len = k_proj.size(1)
+
+        attn_energy = torch.bmm(q_proj, k_proj.transpose(1, 2)) + self.energy_bias
+
+        attn_energy = attn_energy.view(bsz, self.num_heads, tgt_len, src_len)
+
+        if key_padding_mask is not None:
+            attn_energy = attn_energy.masked_fill(
+                key_padding_mask.unsqueeze(1).unsqueeze(2).bool(),
+                float("-inf"),
+            )
+
+        return attn_energy
+
+    def expected_alignment_train(self, p_choose, key_padding_mask):
+        """
+        Calculating expected alignment for MMA
+        Mask is not need because p_choose will be 0 if masked
+
+        q_ij = (1 − p_{ij−1})q_{ij−1} + a+{i−1j}
+        a_ij = p_ij q_ij
+
+        parellel solution:
+        ai = p_i * cumprod(1 − pi) * cumsum(a_i / cumprod(1 − pi))
+
+        ============================================================
+        Expected input size
+        p_choose: bsz * num_heads, tgt_len, src_len
+        """
+
+        # p_choose: bsz * num_heads, tgt_len, src_len
+        bsz_num_heads, tgt_len, src_len = p_choose.size()
+
+        # cumprod_1mp : bsz * num_heads, tgt_len, src_len
+        cumprod_1mp = exclusive_cumprod(1 - p_choose, dim=2, eps=self.eps)
+        cumprod_1mp_clamp = torch.clamp(cumprod_1mp, self.eps, 1.0)
+
+        init_attention = p_choose.new_zeros([bsz_num_heads, 1, src_len])
+        init_attention[:, :, 0] = 1.0
+
+        previous_attn = [init_attention]
+
+        for i in range(tgt_len):
+            # p_choose: bsz * num_heads, tgt_len, src_len
+            # cumprod_1mp_clamp : bsz * num_heads, tgt_len, src_len
+            # previous_attn[i]: bsz * num_heads, 1, src_len
+            # alpha_i: bsz * num_heads, src_len
+            alpha_i = (
+                p_choose[:, i]
+                * cumprod_1mp[:, i]
+                * torch.cumsum(previous_attn[i][:, 0] / cumprod_1mp_clamp[:, i], dim=1)
+            ).clamp(0, 1.0)
+            previous_attn.append(alpha_i.unsqueeze(1))
+
+        # alpha: bsz * num_heads, tgt_len, src_len
+        alpha = torch.cat(previous_attn[1:], dim=1)
+
+        if self.mass_preservation:
+            # Last token has the residual probabilities
+            alpha[:, :, -1] = 1 - alpha[:, :, :-1].sum(dim=-1).clamp(0.0, 1.0)
+
+        assert not torch.isnan(alpha).any(), "NaN detected in alpha."
+
+        return alpha
+
+    def expected_alignment_infer(self, p_choose, key_padding_mask, incremental_state):
+        """
+        Calculating mo alignment for MMA during inference time
+
+        ============================================================
+        Expected input size
+        p_choose: bsz * num_heads, tgt_len, src_len
+        key_padding_mask: bsz * src_len
+        incremental_state: dict
+        """
+        # p_choose: bsz * self.num_heads, src_len
+        bsz_num_heads, tgt_len, src_len = p_choose.size()
+        # One token at a time
+        assert tgt_len == 1
+        p_choose = p_choose[:, 0, :]
+
+        monotonic_cache = self._get_monotonic_buffer(incremental_state)
+
+        # prev_monotonic_step: bsz, num_heads
+        bsz = bsz_num_heads // self.num_heads
+        prev_monotonic_step = monotonic_cache.get(
+            "step", p_choose.new_zeros([bsz, self.num_heads]).long()
+        )
+        bsz, num_heads = prev_monotonic_step.size()
+        assert num_heads == self.num_heads
+        assert bsz * num_heads == bsz_num_heads
+
+        # p_choose: bsz, num_heads, src_len
+        p_choose = p_choose.view(bsz, num_heads, src_len)
+
+        if key_padding_mask is not None:
+            src_lengths = src_len - key_padding_mask.sum(dim=1, keepdim=True).long()
+        else:
+            src_lengths = prev_monotonic_step.new_ones(bsz, 1) * src_len
+
+        # src_lengths: bsz, num_heads
+        src_lengths = src_lengths.expand_as(prev_monotonic_step)
+        # new_monotonic_step: bsz, num_heads
+        new_monotonic_step = prev_monotonic_step
+
+        step_offset = 0
+        if key_padding_mask is not None:
+            if key_padding_mask[:, 0].any():
+                # left_pad_source = True:
+                step_offset = key_padding_mask.sum(dim=-1, keepdim=True)
+
+        max_steps = src_lengths - 1 if self.mass_preservation else src_lengths
+
+        # finish_read: bsz, num_heads
+        finish_read = new_monotonic_step.eq(max_steps)
+
+        while finish_read.sum().item() < bsz * self.num_heads:
+            # p_choose: bsz * self.num_heads, src_len
+            # only choose the p at monotonic steps
+            # p_choose_i: bsz , self.num_heads
+            p_choose_i = (
+                p_choose.gather(
+                    2,
+                    (step_offset + new_monotonic_step)
+                    .unsqueeze(2)
+                    .clamp(0, src_len - 1),
+                )
+            ).squeeze(2)
+
+            action = (
+                (p_choose_i < 0.5)
+                .type_as(prev_monotonic_step)
+                .masked_fill(finish_read, 0)
+            )
+            # 1 x bsz
+            # sample actions on unfinished seq
+            # 1 means stay, finish reading
+            # 0 means leave, continue reading
+            # dist = torch.distributions.bernoulli.Bernoulli(p_choose)
+            # action = dist.sample().type_as(finish_read) * (1 - finish_read)
+
+            new_monotonic_step += action
+
+            finish_read = new_monotonic_step.eq(max_steps) | (action == 0)
+            # finish_read = (~ (finish_read.sum(dim=1, keepdim=True) < self.num_heads / 2)) | finish_read
+
+        monotonic_cache["step"] = new_monotonic_step
+
+        # alpha: bsz * num_heads, 1, src_len
+        # new_monotonic_step: bsz, num_heads
+        alpha = p_choose.new_zeros([bsz * self.num_heads, src_len]).scatter(
+            1,
+            (step_offset + new_monotonic_step)
+            .view(bsz * self.num_heads, 1)
+            .clamp(0, src_len - 1),
+            1,
+        )
+
+        if not self.mass_preservation:
+            alpha = alpha.masked_fill(
+                (new_monotonic_step == max_steps).view(bsz * self.num_heads, 1), 0
+            )
+
+        alpha = alpha.unsqueeze(1)
+
+        self._set_monotonic_buffer(incremental_state, monotonic_cache)
+
+        return alpha
+
+    def v_proj_output(self, value):
+        raise NotImplementedError
+
+    def forward(
+        self,
+        query,
+        key,
+        value,
+        key_padding_mask=None,
+        incremental_state=None,
+        *args,
+        **kwargs,
+    ):
+
+        tgt_len, bsz, embed_dim = query.size()
+        src_len = value.size(0)
+
+        # stepwise prob
+        # p_choose: bsz * self.num_heads, tgt_len, src_len
+        p_choose = self.p_choose(query, key, key_padding_mask)
+
+        # expected alignment alpha
+        # bsz * self.num_heads, tgt_len, src_len
+        if incremental_state is not None:
+            alpha = self.expected_alignment_infer(
+                p_choose, key_padding_mask, incremental_state
+            )
+        else:
+            alpha = self.expected_alignment_train(p_choose, key_padding_mask)
+
+        # expected attention beta
+        # bsz * self.num_heads, tgt_len, src_len
+        beta = self.expected_attention(
+            alpha, query, key, value, key_padding_mask, incremental_state
+        )
+
+        attn_weights = beta
+
+        v_proj = self.v_proj_output(value)
+        attn = torch.bmm(attn_weights.type_as(v_proj), v_proj)
+
+        attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
+
+        attn = self.out_proj(attn)
+
+        beta = beta.view(bsz, self.num_heads, tgt_len, src_len)
+        alpha = alpha.view(bsz, self.num_heads, tgt_len, src_len)
+        p_choose = p_choose.view(bsz, self.num_heads, tgt_len, src_len)
+
+        return attn, {"alpha": alpha, "beta": beta, "p_choose": p_choose}
+
+    def reorder_incremental_state(self, incremental_state, new_order):
+        """Reorder buffered internal state (for incremental generation)."""
+        super().reorder_incremental_state(incremental_state, new_order)
+        input_buffer = self._get_monotonic_buffer(incremental_state)
+        if input_buffer is not None:
+            for k in input_buffer.keys():
+                input_buffer[k] = input_buffer[k].index_select(0, new_order)
+            self._set_monotonic_buffer(incremental_state, input_buffer)
+
+    def _get_monotonic_buffer(self, incremental_state):
+        return (
+            utils.get_incremental_state(
+                self,
+                incremental_state,
+                "monotonic",
+            )
+            or {}
+        )
+
+    def _set_monotonic_buffer(self, incremental_state, buffer):
+        utils.set_incremental_state(
+            self,
+            incremental_state,
+            "monotonic",
+            buffer,
+        )
+
+    def get_pointer(self, incremental_state):
+        return (
+            utils.get_incremental_state(
+                self,
+                incremental_state,
+                "monotonic",
+            )
+            or {}
+        )
+
+    def get_fastest_pointer(self, incremental_state):
+        return self.get_pointer(incremental_state)["step"].max(0)[0]
+
+    def set_pointer(self, incremental_state, p_choose):
+        curr_pointer = self.get_pointer(incremental_state)
+        if len(curr_pointer) == 0:
+            buffer = torch.zeros_like(p_choose)
+        else:
+            buffer = self.get_pointer(incremental_state)["step"]
+
+        buffer += (p_choose < 0.5).type_as(buffer)
+
+        utils.set_incremental_state(
+            self,
+            incremental_state,
+            "monotonic",
+            {"step": buffer},
+        )
+
+
+@register_monotonic_attention("hard_aligned")
+class MonotonicMultiheadAttentionHard(MonotonicAttention, MultiheadAttention):
+    def __init__(self, args):
+        MultiheadAttention.__init__(
+            self,
+            embed_dim=args.decoder_embed_dim,
+            num_heads=args.decoder_attention_heads,
+            kdim=getattr(args, "encoder_embed_dim", None),
+            vdim=getattr(args, "encoder_embed_dim", None),
+            dropout=args.attention_dropout,
+            encoder_decoder_attention=True,
+        )
+
+        MonotonicAttention.__init__(self, args)
+
+        self.k_in_proj = {"monotonic": self.k_proj}
+        self.q_in_proj = {"monotonic": self.q_proj}
+        self.v_in_proj = {"output": self.v_proj}
+
+    def input_projections(self, query, key, value, name):
+        """
+        Prepare inputs for multihead attention
+
+        ============================================================
+        Expected input size
+        query: tgt_len, bsz, embed_dim
+        key: src_len, bsz, embed_dim
+        value: src_len, bsz, embed_dim
+        name: monotonic or soft
+        """
+
+        if query is not None:
+            bsz = query.size(1)
+            q = self.q_in_proj[name](query)
+            q *= self.scaling
+            q = (
+                q.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        else:
+            q = None
+
+        if key is not None:
+            bsz = key.size(1)
+            k = self.k_in_proj[name](key)
+            k = (
+                k.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        else:
+            k = None
+
+        if value is not None:
+            bsz = value.size(1)
+            v = self.v_in_proj[name](value)
+            v = (
+                v.contiguous()
+                .view(-1, bsz * self.num_heads, self.head_dim)
+                .transpose(0, 1)
+            )
+        else:
+            v = None
+
+        return q, k, v
+
+    def p_choose(self, query, key, key_padding_mask=None):
+        """
+        Calculating step wise prob for reading and writing
+        1 to read, 0 to write
+
+        ============================================================
+        Expected input size
+        query: bsz, tgt_len, embed_dim
+        key: bsz, src_len, embed_dim
+        value: bsz, src_len, embed_dim
+        key_padding_mask: bsz, src_len
+        attn_mask: bsz, src_len
+        query: bsz, tgt_len, embed_dim
+        """
+
+        # prepare inputs
+        q_proj, k_proj, _ = self.input_projections(query, key, None, "monotonic")
+
+        # attention energy
+        attn_energy = self.attn_energy(q_proj, k_proj, key_padding_mask)
+
+        noise = 0
+
+        if self.training:
+            # add noise here to encourage discretness
+            noise = (
+                torch.normal(self.noise_mean, self.noise_var, attn_energy.size())
+                .type_as(attn_energy)
+                .to(attn_energy.device)
+            )
+
+        p_choose = torch.sigmoid(attn_energy + noise)
+        _, _, tgt_len, src_len = p_choose.size()
+
+        # p_choose: bsz * self.num_heads, tgt_len, src_len
+        return p_choose.view(-1, tgt_len, src_len)
+
+    def expected_attention(self, alpha, *args):
+        """
+        For MMA-H, beta = alpha
+        """
+        return alpha
+
+    def v_proj_output(self, value):
+        _, _, v_proj = self.input_projections(None, None, value, "output")
+        return v_proj
+
+
+@register_monotonic_attention("infinite_lookback")
+class MonotonicMultiheadAttentionInfiniteLookback(MonotonicMultiheadAttentionHard):
+    def __init__(self, args):
+        super().__init__(args)
+        self.init_soft_attention()
+
+    def init_soft_attention(self):
+        self.k_proj_soft = nn.Linear(self.kdim, self.embed_dim, bias=True)
+        self.q_proj_soft = nn.Linear(self.embed_dim, self.embed_dim, bias=True)
+        self.k_in_proj["soft"] = self.k_proj_soft
+        self.q_in_proj["soft"] = self.q_proj_soft
+
+        if self.qkv_same_dim:
+            # Empirically observed the convergence to be much better with
+            # the scaled initialization
+            nn.init.xavier_uniform_(
+                self.k_in_proj["soft"].weight, gain=1 / math.sqrt(2)
+            )
+            nn.init.xavier_uniform_(
+                self.q_in_proj["soft"].weight, gain=1 / math.sqrt(2)
+            )
+        else:
+            nn.init.xavier_uniform_(self.k_in_proj["soft"].weight)
+            nn.init.xavier_uniform_(self.q_in_proj["soft"].weight)
+
+    def expected_attention(
+        self, alpha, query, key, value, key_padding_mask, incremental_state
+    ):
+        # monotonic attention, we will calculate milk here
+        bsz_x_num_heads, tgt_len, src_len = alpha.size()
+        bsz = int(bsz_x_num_heads / self.num_heads)
+
+        q, k, _ = self.input_projections(query, key, None, "soft")
+        soft_energy = self.attn_energy(q, k, key_padding_mask)
+
+        assert list(soft_energy.size()) == [bsz, self.num_heads, tgt_len, src_len]
+
+        soft_energy = soft_energy.view(bsz * self.num_heads, tgt_len, src_len)
+
+        if incremental_state is not None:
+            monotonic_cache = self._get_monotonic_buffer(incremental_state)
+            monotonic_step = monotonic_cache["step"] + 1
+            step_offset = 0
+            if key_padding_mask is not None:
+                if key_padding_mask[:, 0].any():
+                    # left_pad_source = True:
+                    step_offset = key_padding_mask.sum(dim=-1, keepdim=True)
+            monotonic_step += step_offset
+            mask = lengths_to_mask(
+                monotonic_step.view(-1), soft_energy.size(2), 1
+            ).unsqueeze(1)
+
+            soft_energy = soft_energy.masked_fill(~mask.bool(), float("-inf"))
+            soft_energy = soft_energy - soft_energy.max(dim=2, keepdim=True)[0]
+            exp_soft_energy = torch.exp(soft_energy)
+            exp_soft_energy_sum = exp_soft_energy.sum(dim=2)
+            beta = exp_soft_energy / exp_soft_energy_sum.unsqueeze(2)
+
+        else:
+            # bsz * num_heads, tgt_len, src_len
+            soft_energy = soft_energy - soft_energy.max(dim=2, keepdim=True)[0]
+            exp_soft_energy = torch.exp(soft_energy)
+            exp_soft_energy_cumsum = torch.cumsum(exp_soft_energy, dim=2)
+
+            if key_padding_mask is not None:
+                if key_padding_mask.any():
+                    exp_soft_energy_cumsum = (
+                        exp_soft_energy_cumsum.view(
+                            -1, self.num_heads, tgt_len, src_len
+                        )
+                        .masked_fill(
+                            key_padding_mask.unsqueeze(1).unsqueeze(1), self.eps
+                        )
+                        .view(-1, tgt_len, src_len)
+                    )
+
+            inner_items = alpha / exp_soft_energy_cumsum
+
+            beta = exp_soft_energy * torch.cumsum(
+                inner_items.flip(dims=[2]), dim=2
+            ).flip(dims=[2])
+
+            beta = self.dropout_module(beta)
+
+        assert not torch.isnan(beta).any(), "NaN detected in beta."
+
+        return beta
+
+
+@register_monotonic_attention("waitk")
+class MonotonicMultiheadAttentionWaitk(MonotonicMultiheadAttentionInfiniteLookback):
+    def __init__(self, args):
+        super().__init__(args)
+        self.q_in_proj["soft"] = self.q_in_proj["monotonic"]
+        self.k_in_proj["soft"] = self.k_in_proj["monotonic"]
+        self.waitk_lagging = args.waitk_lagging
+        assert (
+            self.waitk_lagging > 0
+        ), f"Lagging has to been larger than 0, get {self.waitk_lagging}."
+
+    @staticmethod
+    def add_args(parser):
+        super(
+            MonotonicMultiheadAttentionWaitk,
+            MonotonicMultiheadAttentionWaitk,
+        ).add_args(parser)
+
+        parser.add_argument(
+            "--waitk-lagging", type=int, required=True, help="Wait k lagging"
+        )
+
+    def p_choose(
+        self, query, key, key_padding_mask=None, attn_mask=None, incremental_state=None
+    ):
+        """
+        query: bsz, tgt_len
+        key: bsz, src_len
+        key_padding_mask: bsz, src_len
+        """
+        src_len, bsz, _ = key.size()
+        tgt_len, bsz, _ = query.size()
+        p_choose = query.new_ones(bsz, tgt_len, src_len)
+        p_choose = torch.tril(p_choose, diagonal=self.waitk_lagging - 1)
+        p_choose = torch.triu(p_choose, diagonal=self.waitk_lagging - 1)
+
+        if key_padding_mask is not None and key_padding_mask[:, 0].eq(1).any():
+            # Left pad source
+            # add -1 to the end
+            p_choose = p_choose.masked_fill(
+                key_padding_mask.float().flip(1).unsqueeze(1).bool(), -1
+            )
+            p_choose = convert_padding_direction(
+                p_choose.view(-1, src_len).long(), padding_idx=-1, right_to_left=True
+            )
+            p_choose = p_choose.view(bsz, tgt_len, src_len).type_as(query)
+            # remove -1
+            p_choose[p_choose.eq(-1)] = 0
+
+        # Extend to each head
+        p_choose = (
+            p_choose.contiguous()
+            .unsqueeze(1)
+            .expand(-1, self.num_heads, -1, -1)
+            .contiguous()
+            .view(-1, tgt_len, src_len)
+        )
+
+        return p_choose
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/modules/monotonic_transformer_layer.py b/fairseq-0.10.2/examples/simultaneous_translation/modules/monotonic_transformer_layer.py
new file mode 100644
index 0000000000000000000000000000000000000000..442b7d487deafb6639239e4be2d79b71e3162d37
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/modules/monotonic_transformer_layer.py
@@ -0,0 +1,48 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+from fairseq.modules import LayerNorm, TransformerDecoderLayer, TransformerEncoderLayer
+
+from . import build_monotonic_attention
+
+
+class TransformerMonotonicEncoderLayer(TransformerEncoderLayer):
+    def forward(self, x, encoder_padding_mask):
+        seq_len, _, _ = x.size()
+        attn_mask = x.new_ones([seq_len, seq_len]).triu(1)
+        attn_mask = attn_mask.masked_fill(attn_mask.bool(), float("-inf"))
+        return super().forward(x, encoder_padding_mask, attn_mask)
+
+
+class TransformerMonotonicDecoderLayer(TransformerDecoderLayer):
+    def __init__(
+        self, args, no_encoder_attn=False, add_bias_kv=False, add_zero_attn=False
+    ):
+        super().__init__(
+            args,
+            no_encoder_attn=True,
+            add_bias_kv=add_bias_kv,
+            add_zero_attn=add_zero_attn,
+        )
+        self.encoder_attn = build_monotonic_attention(args)
+        self.encoder_attn_layer_norm = LayerNorm(
+            self.embed_dim, export=getattr(args, "char_inputs", False)
+        )
+
+    def prune_incremental_state(self, incremental_state):
+        def prune(module):
+            input_buffer = module._get_input_buffer(incremental_state)
+            for key in ["prev_key", "prev_value"]:
+                if input_buffer[key].size(2) > 1:
+                    input_buffer[key] = input_buffer[key][:, :, :-1, :]
+                else:
+                    input_buffer = {}
+                    break
+            module._set_input_buffer(incremental_state, input_buffer)
+
+        prune(self.self_attn)
+
+    def get_steps(self, incremental_state):
+        return self.encoder_attn._get_monotonic_buffer(incremental_state).get("step", 0)
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/utils/__init__.py b/fairseq-0.10.2/examples/simultaneous_translation/utils/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..be0ba4d99afab26a0422dbe4d014ea111e8c5110
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/utils/__init__.py
@@ -0,0 +1,14 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import importlib
+import os
+
+
+# automatically import any Python files in the criterions/ directory
+for file in os.listdir(os.path.dirname(__file__)):
+    if file.endswith(".py") and not file.startswith("_"):
+        module = file[: file.find(".py")]
+        importlib.import_module("examples.simultaneous_translation.utils." + module)
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/utils/functions.py b/fairseq-0.10.2/examples/simultaneous_translation/utils/functions.py
new file mode 100644
index 0000000000000000000000000000000000000000..f795b5f31cee6d9f8387d6402994b9cbb4c98190
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/utils/functions.py
@@ -0,0 +1,149 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+
+def exclusive_cumprod(tensor, dim: int, eps: float = 1e-10):
+    """
+    Implementing exclusive cumprod.
+    There is cumprod in pytorch, however there is no exclusive mode.
+    cumprod(x) = [x1, x1x2, x2x3x4, ..., prod_{i=1}^n x_i]
+    exclusive means cumprod(x) = [1, x1, x1x2, x1x2x3, ..., prod_{i=1}^{n-1} x_i]
+    """
+    tensor_size = list(tensor.size())
+    tensor_size[dim] = 1
+    return_tensor = safe_cumprod(
+        torch.cat([torch.ones(tensor_size).type_as(tensor), tensor], dim=dim),
+        dim=dim,
+        eps=eps,
+    )
+
+    if dim == 0:
+        return return_tensor[:-1]
+    elif dim == 1:
+        return return_tensor[:, :-1]
+    elif dim == 2:
+        return return_tensor[:, :, :-1]
+    else:
+        raise RuntimeError("Cumprod on dimension 3 and more is not implemented")
+
+
+def safe_cumprod(tensor, dim: int, eps: float = 1e-10):
+    """
+    An implementation of cumprod to prevent precision issue.
+    cumprod(x)
+    = [x1, x1x2, x1x2x3, ....]
+    = [exp(log(x1)), exp(log(x1) + log(x2)), exp(log(x1) + log(x2) + log(x3)), ...]
+    = exp(cumsum(log(x)))
+    """
+
+    if (tensor + eps < 0).any().item():
+        raise RuntimeError(
+            "Safe cumprod can only take non-negative tensors as input."
+            "Consider use torch.cumprod if you want to calculate negative values."
+        )
+
+    log_tensor = torch.log(tensor + eps)
+    cumsum_log_tensor = torch.cumsum(log_tensor, dim)
+    exp_cumsum_log_tensor = torch.exp(cumsum_log_tensor)
+    return exp_cumsum_log_tensor
+
+
+def lengths_to_mask(lengths, max_len: int, dim: int = 0, negative_mask: bool = False):
+    """
+    Convert a tensor of lengths to mask
+    For example, lengths = [[2, 3, 4]], max_len = 5
+    mask =
+       [[1, 1, 1],
+        [1, 1, 1],
+        [0, 1, 1],
+        [0, 0, 1],
+        [0, 0, 0]]
+    """
+    assert len(lengths.size()) <= 2
+    if len(lengths) == 2:
+        if dim == 1:
+            lengths = lengths.t()
+        lengths = lengths
+    else:
+        lengths = lengths.unsqueeze(1)
+
+    # lengths : batch_size, 1
+    lengths = lengths.view(-1, 1)
+
+    batch_size = lengths.size(0)
+    # batch_size, max_len
+    mask = torch.arange(max_len).expand(batch_size, max_len).type_as(lengths) < lengths
+
+    if negative_mask:
+        mask = ~mask
+
+    if dim == 0:
+        # max_len, batch_size
+        mask = mask.t()
+
+    return mask
+
+
+def moving_sum(x, start_idx: int, end_idx: int):
+    """
+    From MONOTONIC CHUNKWISE ATTENTION
+    https://arxiv.org/pdf/1712.05382.pdf
+    Equation (18)
+
+    x = [x_1, x_2, ..., x_N]
+    MovingSum(x, start_idx, end_idx)_n = Sigma_{m=n−(start_idx−1)}^{n+end_idx-1} x_m
+    for n in {1, 2, 3, ..., N}
+
+    x : src_len, batch_size
+    start_idx : start idx
+    end_idx : end idx
+
+    Example
+    src_len = 5
+    batch_size = 3
+    x =
+       [[ 0, 5, 10],
+        [ 1, 6, 11],
+        [ 2, 7, 12],
+        [ 3, 8, 13],
+        [ 4, 9, 14]]
+
+    MovingSum(x, 3, 1) =
+       [[ 0,  5, 10],
+        [ 1, 11, 21],
+        [ 3, 18, 33],
+        [ 6, 21, 36],
+        [ 9, 24, 39]]
+
+    MovingSum(x, 1, 3) =
+       [[ 3, 18, 33],
+        [ 6, 21, 36],
+        [ 9, 24, 39],
+        [ 7, 17, 27],
+        [ 4,  9, 14]]
+    """
+    assert start_idx > 0 and end_idx > 0
+    assert len(x.size()) == 2
+    src_len, batch_size = x.size()
+    # batch_size, 1, src_len
+    x = x.t().unsqueeze(1)
+    # batch_size, 1, src_len
+    moving_sum_weight = x.new_ones([1, 1, end_idx + start_idx - 1])
+
+    moving_sum = (
+        torch.nn.functional.conv1d(
+            x, moving_sum_weight, padding=start_idx + end_idx - 1
+        )
+        .squeeze(1)
+        .t()
+    )
+    moving_sum = moving_sum[end_idx:-start_idx]
+
+    assert src_len == moving_sum.size(0)
+    assert batch_size == moving_sum.size(1)
+
+    return moving_sum
diff --git a/fairseq-0.10.2/examples/simultaneous_translation/utils/latency.py b/fairseq-0.10.2/examples/simultaneous_translation/utils/latency.py
new file mode 100644
index 0000000000000000000000000000000000000000..5d800a5d9e992be49cedc72b7a9604a32e35fbcc
--- /dev/null
+++ b/fairseq-0.10.2/examples/simultaneous_translation/utils/latency.py
@@ -0,0 +1,451 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import torch
+
+
+class LatencyMetric(object):
+    @staticmethod
+    def length_from_padding_mask(padding_mask, batch_first: bool = False):
+        dim = 1 if batch_first else 0
+        return padding_mask.size(dim) - padding_mask.sum(dim=dim, keepdim=True)
+
+    def prepare_latency_metric(
+        self,
+        delays,
+        src_lens,
+        target_padding_mask=None,
+        batch_first: bool = False,
+        start_from_zero: bool = True,
+    ):
+        assert len(delays.size()) == 2
+        assert len(src_lens.size()) == 2
+
+        if start_from_zero:
+            delays = delays + 1
+
+        if batch_first:
+            # convert to batch_last
+            delays = delays.t()
+            src_lens = src_lens.t()
+            tgt_len, bsz = delays.size()
+            _, bsz_1 = src_lens.size()
+
+            if target_padding_mask is not None:
+                target_padding_mask = target_padding_mask.t()
+                tgt_len_1, bsz_2 = target_padding_mask.size()
+                assert tgt_len == tgt_len_1
+                assert bsz == bsz_2
+
+        assert bsz == bsz_1
+
+        if target_padding_mask is None:
+            tgt_lens = tgt_len * delays.new_ones([1, bsz]).float()
+        else:
+            # 1, batch_size
+            tgt_lens = self.length_from_padding_mask(target_padding_mask, False).float()
+            delays = delays.masked_fill(target_padding_mask, 0)
+
+        return delays, src_lens, tgt_lens, target_padding_mask
+
+    def __call__(
+        self,
+        delays,
+        src_lens,
+        target_padding_mask=None,
+        batch_first: bool = False,
+        start_from_zero: bool = True,
+    ):
+        delays, src_lens, tgt_lens, target_padding_mask = self.prepare_latency_metric(
+            delays, src_lens, target_padding_mask, batch_first, start_from_zero
+        )
+        return self.cal_metric(delays, src_lens, tgt_lens, target_padding_mask)
+
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        """
+        Expected sizes:
+        delays: tgt_len, batch_size
+        src_lens: 1, batch_size
+        target_padding_mask: tgt_len, batch_size
+        """
+        raise NotImplementedError
+
+
+class AverageProportion(LatencyMetric):
+    """
+    Function to calculate Average Proportion from
+    Can neural machine translation do simultaneous translation?
+    (https://arxiv.org/abs/1606.02012)
+
+    Delays are monotonic steps, range from 1 to src_len.
+    Give src x tgt y, AP is calculated as:
+
+    AP = 1 / (|x||y]) sum_i^|Y| deleys_i
+    """
+
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        if target_padding_mask is not None:
+            AP = torch.sum(
+                delays.masked_fill(target_padding_mask, 0), dim=0, keepdim=True
+            )
+        else:
+            AP = torch.sum(delays, dim=0, keepdim=True)
+
+        AP = AP / (src_lens * tgt_lens)
+        return AP
+
+
+class AverageLagging(LatencyMetric):
+    """
+    Function to calculate Average Lagging from
+    STACL: Simultaneous Translation with Implicit Anticipation
+    and Controllable Latency using Prefix-to-Prefix Framework
+    (https://arxiv.org/abs/1810.08398)
+
+    Delays are monotonic steps, range from 1 to src_len.
+    Give src x tgt y, AP is calculated as:
+
+    AL = 1 / tau sum_i^tau delays_i - (i - 1) / gamma
+
+    Where
+    gamma = |y| / |x|
+    tau = argmin_i(delays_i = |x|)
+    """
+
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        # tau = argmin_i(delays_i = |x|)
+        tgt_len, bsz = delays.size()
+        lagging_padding_mask = delays >= src_lens
+        lagging_padding_mask = torch.nn.functional.pad(
+            lagging_padding_mask.t(), (1, 0)
+        ).t()[:-1, :]
+        gamma = tgt_lens / src_lens
+        lagging = (
+            delays
+            - torch.arange(delays.size(0))
+            .unsqueeze(1)
+            .type_as(delays)
+            .expand_as(delays)
+            / gamma
+        )
+        lagging.masked_fill_(lagging_padding_mask, 0)
+        tau = (1 - lagging_padding_mask.type_as(lagging)).sum(dim=0, keepdim=True)
+        AL = lagging.sum(dim=0, keepdim=True) / tau
+
+        return AL
+
+
+class DifferentiableAverageLagging(LatencyMetric):
+    """
+    Function to calculate Differentiable Average Lagging from
+    Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
+    (https://arxiv.org/abs/1906.05218)
+
+    Delays are monotonic steps, range from 0 to src_len-1.
+    (In the original paper thery are from 1 to src_len)
+    Give src x tgt y, AP is calculated as:
+
+    DAL = 1 / |Y| sum_i^|Y| delays'_i - (i - 1) / gamma
+
+    Where
+    delays'_i =
+        1. delays_i if i == 1
+        2. max(delays_i, delays'_{i-1} + 1 / gamma)
+
+    """
+
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        tgt_len, bsz = delays.size()
+
+        gamma = tgt_lens / src_lens
+        new_delays = torch.zeros_like(delays)
+
+        for i in range(delays.size(0)):
+            if i == 0:
+                new_delays[i] = delays[i]
+            else:
+                new_delays[i] = torch.cat(
+                    [
+                        new_delays[i - 1].unsqueeze(0) + 1 / gamma,
+                        delays[i].unsqueeze(0),
+                    ],
+                    dim=0,
+                ).max(dim=0)[0]
+
+        DAL = (
+            new_delays
+            - torch.arange(delays.size(0))
+            .unsqueeze(1)
+            .type_as(delays)
+            .expand_as(delays)
+            / gamma
+        )
+        if target_padding_mask is not None:
+            DAL = DAL.masked_fill(target_padding_mask, 0)
+
+        DAL = DAL.sum(dim=0, keepdim=True) / tgt_lens
+
+        return DAL
+
+
+class LatencyMetricVariance(LatencyMetric):
+    def prepare_latency_metric(
+        self,
+        delays,
+        src_lens,
+        target_padding_mask=None,
+        batch_first: bool = True,
+        start_from_zero: bool = True,
+    ):
+        assert batch_first
+        assert len(delays.size()) == 3
+        assert len(src_lens.size()) == 2
+
+        if start_from_zero:
+            delays = delays + 1
+
+        # convert to batch_last
+        bsz, num_heads_x_layers, tgt_len = delays.size()
+        bsz_1, _ = src_lens.size()
+        assert bsz == bsz_1
+
+        if target_padding_mask is not None:
+            bsz_2, tgt_len_1 = target_padding_mask.size()
+            assert tgt_len == tgt_len_1
+            assert bsz == bsz_2
+
+        if target_padding_mask is None:
+            tgt_lens = tgt_len * delays.new_ones([bsz, tgt_len]).float()
+        else:
+            # batch_size, 1
+            tgt_lens = self.length_from_padding_mask(target_padding_mask, True).float()
+            delays = delays.masked_fill(target_padding_mask.unsqueeze(1), 0)
+
+        return delays, src_lens, tgt_lens, target_padding_mask
+
+
+class VarianceDelay(LatencyMetricVariance):
+    @staticmethod
+    def cal_metric(delays, src_lens, tgt_lens, target_padding_mask):
+        """
+        delays : bsz, num_heads_x_layers, tgt_len
+        src_lens : bsz, 1
+        target_lens : bsz, 1
+        target_padding_mask: bsz, tgt_len or None
+        """
+        if delays.size(1) == 1:
+            return delays.new_zeros([1])
+
+        variance_delays = delays.var(dim=1)
+
+        if target_padding_mask is not None:
+            variance_delays.masked_fill_(target_padding_mask, 0)
+
+        return variance_delays.sum(dim=1, keepdim=True) / tgt_lens
+
+
+class LatencyInference(object):
+    def __init__(self, start_from_zero=True):
+        self.metric_calculator = {
+            "differentiable_average_lagging": DifferentiableAverageLagging(),
+            "average_lagging": AverageLagging(),
+            "average_proportion": AverageProportion(),
+        }
+
+        self.start_from_zero = start_from_zero
+
+    def __call__(self, monotonic_step, src_lens):
+        """
+        monotonic_step range from 0 to src_len. src_len means eos
+        delays: bsz, tgt_len
+        src_lens: bsz, 1
+        """
+        if not self.start_from_zero:
+            monotonic_step -= 1
+
+        src_lens = src_lens
+
+        delays = monotonic_step.view(
+            monotonic_step.size(0), -1, monotonic_step.size(-1)
+        ).max(dim=1)[0]
+
+        delays = delays.masked_fill(delays >= src_lens, 0) + (src_lens - 1).expand_as(
+            delays
+        ).masked_fill(delays < src_lens, 0)
+        return_dict = {}
+        for key, func in self.metric_calculator.items():
+            return_dict[key] = func(
+                delays.float(),
+                src_lens.float(),
+                target_padding_mask=None,
+                batch_first=True,
+                start_from_zero=True,
+            ).t()
+
+        return return_dict
+
+
+class LatencyTraining(object):
+    def __init__(
+        self,
+        avg_weight,
+        var_weight,
+        avg_type,
+        var_type,
+        stay_on_last_token,
+        average_method,
+    ):
+        self.avg_weight = avg_weight
+        self.var_weight = var_weight
+        self.avg_type = avg_type
+        self.var_type = var_type
+        self.stay_on_last_token = stay_on_last_token
+        self.average_method = average_method
+
+        self.metric_calculator = {
+            "differentiable_average_lagging": DifferentiableAverageLagging(),
+            "average_lagging": AverageLagging(),
+            "average_proportion": AverageProportion(),
+        }
+
+        self.variance_calculator = {
+            "variance_delay": VarianceDelay(),
+        }
+
+    def expected_delays_from_attention(
+        self, attention, source_padding_mask=None, target_padding_mask=None
+    ):
+        if type(attention) == list:
+            # bsz, num_heads, tgt_len, src_len
+            bsz, num_heads, tgt_len, src_len = attention[0].size()
+            attention = torch.cat(attention, dim=1)
+            bsz, num_heads_x_layers, tgt_len, src_len = attention.size()
+            # bsz * num_heads * num_layers, tgt_len, src_len
+            attention = attention.view(-1, tgt_len, src_len)
+        else:
+            # bsz * num_heads * num_layers, tgt_len, src_len
+            bsz, tgt_len, src_len = attention.size()
+            num_heads_x_layers = 1
+            attention = attention.view(-1, tgt_len, src_len)
+
+        if not self.stay_on_last_token:
+            residual_attention = 1 - attention[:, :, :-1].sum(dim=2, keepdim=True)
+            attention = torch.cat([attention[:, :, :-1], residual_attention], dim=2)
+
+        # bsz * num_heads_x_num_layers, tgt_len, src_len for MMA
+        steps = (
+            torch.arange(1, 1 + src_len)
+            .unsqueeze(0)
+            .unsqueeze(1)
+            .expand_as(attention)
+            .type_as(attention)
+        )
+
+        if source_padding_mask is not None:
+            src_offset = (
+                source_padding_mask.type_as(attention)
+                .sum(dim=1, keepdim=True)
+                .expand(bsz, num_heads_x_layers)
+                .contiguous()
+                .view(-1, 1)
+            )
+            src_lens = src_len - src_offset
+            if source_padding_mask[:, 0].any():
+                # Pad left
+                src_offset = src_offset.view(-1, 1, 1)
+                steps = steps - src_offset
+                steps = steps.masked_fill(steps <= 0, 0)
+        else:
+            src_lens = attention.new_ones([bsz, num_heads_x_layers]) * src_len
+            src_lens = src_lens.view(-1, 1)
+
+        # bsz * num_heads_num_layers, tgt_len, src_len
+        expected_delays = (
+            (steps * attention).sum(dim=2).view(bsz, num_heads_x_layers, tgt_len)
+        )
+
+        if target_padding_mask is not None:
+            expected_delays.masked_fill_(target_padding_mask.unsqueeze(1), 0)
+
+        return expected_delays, src_lens
+
+    def avg_loss(self, expected_delays, src_lens, target_padding_mask):
+
+        bsz, num_heads_x_layers, tgt_len = expected_delays.size()
+        target_padding_mask = (
+            target_padding_mask.unsqueeze(1)
+            .expand_as(expected_delays)
+            .contiguous()
+            .view(-1, tgt_len)
+        )
+
+        if self.average_method == "average":
+            # bsz * tgt_len
+            expected_delays = expected_delays.mean(dim=1)
+        elif self.average_method == "weighted_average":
+            weights = torch.nn.functional.softmax(expected_delays, dim=1)
+            expected_delays = torch.sum(expected_delays * weights, dim=1)
+        elif self.average_method == "max":
+            # bsz * num_heads_x_num_layers, tgt_len
+            expected_delays = expected_delays.max(dim=1)[0]
+        else:
+            raise RuntimeError(f"{self.average_method} is not supported")
+
+        src_lens = src_lens.view(bsz, -1)[:, :1]
+        target_padding_mask = target_padding_mask.view(bsz, -1, tgt_len)[:, 0]
+
+        if self.avg_weight > 0.0:
+            if self.avg_type in self.metric_calculator:
+                average_delays = self.metric_calculator[self.avg_type](
+                    expected_delays,
+                    src_lens,
+                    target_padding_mask,
+                    batch_first=True,
+                    start_from_zero=False,
+                )
+            else:
+                raise RuntimeError(f"{self.avg_type} is not supported.")
+
+            # bsz * num_heads_x_num_layers, 1
+            return self.avg_weight * average_delays.sum()
+        else:
+            return 0.0
+
+    def var_loss(self, expected_delays, src_lens, target_padding_mask):
+        src_lens = src_lens.view(expected_delays.size(0), expected_delays.size(1))[
+            :, :1
+        ]
+        if self.var_weight > 0.0:
+            if self.var_type in self.variance_calculator:
+                variance_delays = self.variance_calculator[self.var_type](
+                    expected_delays,
+                    src_lens,
+                    target_padding_mask,
+                    batch_first=True,
+                    start_from_zero=False,
+                )
+            else:
+                raise RuntimeError(f"{self.var_type} is not supported.")
+
+            return self.var_weight * variance_delays.sum()
+        else:
+            return 0.0
+
+    def loss(self, attention, source_padding_mask=None, target_padding_mask=None):
+        expected_delays, src_lens = self.expected_delays_from_attention(
+            attention, source_padding_mask, target_padding_mask
+        )
+
+        latency_loss = 0
+
+        latency_loss += self.avg_loss(expected_delays, src_lens, target_padding_mask)
+
+        latency_loss += self.var_loss(expected_delays, src_lens, target_padding_mask)
+
+        return latency_loss
diff --git a/fairseq-0.10.2/scripts/__init__.py b/fairseq-0.10.2/scripts/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/fairseq-0.10.2/scripts/average_checkpoints.py b/fairseq-0.10.2/scripts/average_checkpoints.py
new file mode 100644
index 0000000000000000000000000000000000000000..c512f802bce6b3395cc42a0e4eb39181e9f8c873
--- /dev/null
+++ b/fairseq-0.10.2/scripts/average_checkpoints.py
@@ -0,0 +1,158 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import collections
+import os
+import re
+
+import torch
+from fairseq.file_io import PathManager
+
+
+def average_checkpoints(inputs):
+    """Loads checkpoints from inputs and returns a model with averaged weights.
+
+    Args:
+      inputs: An iterable of string paths of checkpoints to load from.
+
+    Returns:
+      A dict of string keys mapping to various values. The 'model' key
+      from the returned dict should correspond to an OrderedDict mapping
+      string parameter names to torch Tensors.
+    """
+    params_dict = collections.OrderedDict()
+    params_keys = None
+    new_state = None
+    num_models = len(inputs)
+
+    for fpath in inputs:
+        with PathManager.open(fpath, "rb") as f:
+            state = torch.load(
+                f,
+                map_location=(
+                    lambda s, _: torch.serialization.default_restore_location(s, "cpu")
+                ),
+            )
+        # Copies over the settings from the first checkpoint
+        if new_state is None:
+            new_state = state
+
+        model_params = state["model"]
+
+        model_params_keys = list(model_params.keys())
+        if params_keys is None:
+            params_keys = model_params_keys
+        elif params_keys != model_params_keys:
+            raise KeyError(
+                "For checkpoint {}, expected list of params: {}, "
+                "but found: {}".format(f, params_keys, model_params_keys)
+            )
+
+        for k in params_keys:
+            p = model_params[k]
+            if isinstance(p, torch.HalfTensor):
+                p = p.float()
+            if k not in params_dict:
+                params_dict[k] = p.clone()
+                # NOTE: clone() is needed in case of p is a shared parameter
+            else:
+                params_dict[k] += p
+
+    averaged_params = collections.OrderedDict()
+    for k, v in params_dict.items():
+        averaged_params[k] = v
+        if averaged_params[k].is_floating_point():
+            averaged_params[k].div_(num_models)
+        else:
+            averaged_params[k] //= num_models
+    new_state["model"] = averaged_params
+    return new_state
+
+
+def last_n_checkpoints(paths, n, update_based, upper_bound=None):
+    assert len(paths) == 1
+    path = paths[0]
+    if update_based:
+        pt_regexp = re.compile(r"checkpoint_\d+_(\d+)\.pt")
+    else:
+        pt_regexp = re.compile(r"checkpoint(\d+)\.pt")
+    files = PathManager.ls(path)
+
+    entries = []
+    for f in files:
+        m = pt_regexp.fullmatch(f)
+        if m is not None:
+            sort_key = int(m.group(1))
+            if upper_bound is None or sort_key <= upper_bound:
+                entries.append((sort_key, m.group(0)))
+    if len(entries) < n:
+        raise Exception(
+            "Found {} checkpoint files but need at least {}", len(entries), n
+        )
+    return [os.path.join(path, x[1]) for x in sorted(entries, reverse=True)[:n]]
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Tool to average the params of input checkpoints to "
+        "produce a new checkpoint",
+    )
+    # fmt: off
+    parser.add_argument('--inputs', required=True, nargs='+',
+                        help='Input checkpoint file paths.')
+    parser.add_argument('--output', required=True, metavar='FILE',
+                        help='Write the new checkpoint containing the averaged weights to this path.')
+    num_group = parser.add_mutually_exclusive_group()
+    num_group.add_argument('--num-epoch-checkpoints', type=int,
+                           help='if set, will try to find checkpoints with names checkpoint_xx.pt in the path specified by input, '
+                           'and average last this many of them.')
+    num_group.add_argument('--num-update-checkpoints', type=int,
+                           help='if set, will try to find checkpoints with names checkpoint_ee_xx.pt in the path specified by input, '
+                           'and average last this many of them.')
+    parser.add_argument('--checkpoint-upper-bound', type=int,
+                        help='when using --num-epoch-checkpoints, this will set an upper bound on which epoch to use, '
+                        'when using --num-update-checkpoints, this will set an upper bound on which update to use'
+                        'e.g., with --num-epoch-checkpoints=10 --checkpoint-upper-bound=50, checkpoints 41-50 would be averaged.'
+                        'e.g., with --num-update-checkpoints=10 --checkpoint-upper-bound=50000, checkpoints 40500-50000 would be averaged assuming --save-interval-updates 500'
+                        )
+    # fmt: on
+    args = parser.parse_args()
+    print(args)
+
+    num = None
+    is_update_based = False
+    if args.num_update_checkpoints is not None:
+        num = args.num_update_checkpoints
+        is_update_based = True
+    elif args.num_epoch_checkpoints is not None:
+        num = args.num_epoch_checkpoints
+
+    assert args.checkpoint_upper_bound is None or (
+        args.num_epoch_checkpoints is not None
+        or args.num_update_checkpoints is not None
+    ), "--checkpoint-upper-bound requires --num-epoch-checkpoints or --num-update-checkpoints"
+    assert (
+        args.num_epoch_checkpoints is None or args.num_update_checkpoints is None
+    ), "Cannot combine --num-epoch-checkpoints and --num-update-checkpoints"
+
+    if num is not None:
+        args.inputs = last_n_checkpoints(
+            args.inputs,
+            num,
+            is_update_based,
+            upper_bound=args.checkpoint_upper_bound,
+        )
+        print("averaging checkpoints: ", args.inputs)
+
+    new_state = average_checkpoints(args.inputs)
+    with PathManager.open(args.output, "wb") as f:
+        torch.save(new_state, f)
+    print("Finished writing averaged checkpoint to {}".format(args.output))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/scripts/build_sym_alignment.py b/fairseq-0.10.2/scripts/build_sym_alignment.py
new file mode 100644
index 0000000000000000000000000000000000000000..0ca5c18f7bd4b0fbf58b203793506ca395466129
--- /dev/null
+++ b/fairseq-0.10.2/scripts/build_sym_alignment.py
@@ -0,0 +1,97 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Use this script in order to build symmetric alignments for your translation
+dataset.
+This script depends on fast_align and mosesdecoder tools. You will need to
+build those before running the script.
+fast_align:
+    github: http://github.com/clab/fast_align
+    instructions: follow the instructions in README.md
+mosesdecoder:
+    github: http://github.com/moses-smt/mosesdecoder
+    instructions: http://www.statmt.org/moses/?n=Development.GetStarted
+The script produces the following files under --output_dir:
+    text.joined - concatenation of lines from the source_file and the
+    target_file.
+    align.forward - forward pass of fast_align.
+    align.backward - backward pass of fast_align.
+    aligned.sym_heuristic - symmetrized alignment.
+"""
+
+import argparse
+import os
+from itertools import zip_longest
+
+
+def main():
+    parser = argparse.ArgumentParser(description="symmetric alignment builer")
+    # fmt: off
+    parser.add_argument('--fast_align_dir',
+                        help='path to fast_align build directory')
+    parser.add_argument('--mosesdecoder_dir',
+                        help='path to mosesdecoder root directory')
+    parser.add_argument('--sym_heuristic',
+                        help='heuristic to use for symmetrization',
+                        default='grow-diag-final-and')
+    parser.add_argument('--source_file',
+                        help='path to a file with sentences '
+                             'in the source language')
+    parser.add_argument('--target_file',
+                        help='path to a file with sentences '
+                             'in the target language')
+    parser.add_argument('--output_dir',
+                        help='output directory')
+    # fmt: on
+    args = parser.parse_args()
+
+    fast_align_bin = os.path.join(args.fast_align_dir, "fast_align")
+    symal_bin = os.path.join(args.mosesdecoder_dir, "bin", "symal")
+    sym_fast_align_bin = os.path.join(
+        args.mosesdecoder_dir, "scripts", "ems", "support", "symmetrize-fast-align.perl"
+    )
+
+    # create joined file
+    joined_file = os.path.join(args.output_dir, "text.joined")
+    with open(args.source_file, "r", encoding="utf-8") as src, open(
+        args.target_file, "r", encoding="utf-8"
+    ) as tgt:
+        with open(joined_file, "w", encoding="utf-8") as joined:
+            for s, t in zip_longest(src, tgt):
+                print("{} ||| {}".format(s.strip(), t.strip()), file=joined)
+
+    bwd_align_file = os.path.join(args.output_dir, "align.backward")
+
+    # run forward alignment
+    fwd_align_file = os.path.join(args.output_dir, "align.forward")
+    fwd_fast_align_cmd = "{FASTALIGN} -i {JOINED} -d -o -v > {FWD}".format(
+        FASTALIGN=fast_align_bin, JOINED=joined_file, FWD=fwd_align_file
+    )
+    assert os.system(fwd_fast_align_cmd) == 0
+
+    # run backward alignment
+    bwd_align_file = os.path.join(args.output_dir, "align.backward")
+    bwd_fast_align_cmd = "{FASTALIGN} -i {JOINED} -d -o -v -r > {BWD}".format(
+        FASTALIGN=fast_align_bin, JOINED=joined_file, BWD=bwd_align_file
+    )
+    assert os.system(bwd_fast_align_cmd) == 0
+
+    # run symmetrization
+    sym_out_file = os.path.join(args.output_dir, "aligned")
+    sym_cmd = "{SYMFASTALIGN} {FWD} {BWD} {SRC} {TGT} {OUT} {HEURISTIC} {SYMAL}".format(
+        SYMFASTALIGN=sym_fast_align_bin,
+        FWD=fwd_align_file,
+        BWD=bwd_align_file,
+        SRC=args.source_file,
+        TGT=args.target_file,
+        OUT=sym_out_file,
+        HEURISTIC=args.sym_heuristic,
+        SYMAL=symal_bin,
+    )
+    assert os.system(sym_cmd) == 0
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/scripts/compare_namespaces.py b/fairseq-0.10.2/scripts/compare_namespaces.py
new file mode 100644
index 0000000000000000000000000000000000000000..bc24db624f8db36f546c263ba3a806dae6d466bf
--- /dev/null
+++ b/fairseq-0.10.2/scripts/compare_namespaces.py
@@ -0,0 +1,46 @@
+#!/usr/bin/env python
+"""Helper script to compare two argparse.Namespace objects."""
+
+from argparse import Namespace  # noqa
+
+
+def main():
+
+    ns1 = eval(input("Namespace 1: "))
+    ns2 = eval(input("Namespace 2: "))
+
+    def keys(ns):
+        ks = set()
+        for k in dir(ns):
+            if not k.startswith("_"):
+                ks.add(k)
+        return ks
+
+    k1 = keys(ns1)
+    k2 = keys(ns2)
+
+    def print_keys(ks, ns1, ns2=None):
+        for k in ks:
+            if ns2 is None:
+                print("{}\t{}".format(k, getattr(ns1, k, None)))
+            else:
+                print(
+                    "{}\t{}\t{}".format(k, getattr(ns1, k, None), getattr(ns2, k, None))
+                )
+
+    print("Keys unique to namespace 1:")
+    print_keys(k1 - k2, ns1)
+    print()
+
+    print("Keys unique to namespace 2:")
+    print_keys(k2 - k1, ns2)
+    print()
+
+    print("Overlapping keys with different values:")
+    ks = [k for k in k1 & k2 if getattr(ns1, k, "None") != getattr(ns2, k, "None")]
+    print_keys(ks, ns1, ns2)
+    print()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/scripts/compound_split_bleu.sh b/fairseq-0.10.2/scripts/compound_split_bleu.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1972fddcebff9a43a70bcf14c287175c68f60e3f
--- /dev/null
+++ b/fairseq-0.10.2/scripts/compound_split_bleu.sh
@@ -0,0 +1,20 @@
+#!/bin/bash
+
+if [ $# -ne 1 ]; then
+    echo "usage: $0 GENERATE_PY_OUTPUT"
+    exit 1
+fi
+
+GEN=$1
+
+SYS=$GEN.sys
+REF=$GEN.ref
+
+if [ $(tail -n 1 $GEN | grep BLEU | wc -l) -ne 1 ]; then
+    echo "not done generating"
+    exit
+fi
+
+grep ^H $GEN | awk -F '\t' '{print $NF}' | perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' > $SYS
+grep ^T $GEN | cut -f2- | perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' > $REF
+fairseq-score --sys $SYS --ref $REF
diff --git a/fairseq-0.10.2/scripts/constraints/extract.py b/fairseq-0.10.2/scripts/constraints/extract.py
new file mode 100644
index 0000000000000000000000000000000000000000..f6155d0a0538aadb46bf612256b6b949728de69e
--- /dev/null
+++ b/fairseq-0.10.2/scripts/constraints/extract.py
@@ -0,0 +1,92 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+"""Extracts random constraints from reference files."""
+
+import argparse
+import random
+import sys
+
+from sacrebleu import extract_ngrams
+
+
+def get_phrase(words, index, length):
+    assert index < len(words) - length + 1
+    phr = " ".join(words[index : index + length])
+    for i in range(index, index + length):
+        words.pop(index)
+    return phr
+
+
+def main(args):
+
+    if args.seed:
+        random.seed(args.seed)
+
+    for line in sys.stdin:
+        constraints = []
+
+        def add_constraint(constraint):
+            constraints.append(constraint)
+
+        source = line.rstrip()
+        if "\t" in line:
+            source, target = line.split("\t")
+            if args.add_sos:
+                target = f"<s> {target}"
+            if args.add_eos:
+                target = f"{target} </s>"
+
+            if len(target.split()) >= args.len:
+                words = [target]
+
+                num = args.number
+
+                choices = {}
+                for i in range(num):
+                    if len(words) == 0:
+                        break
+                    segmentno = random.choice(range(len(words)))
+                    segment = words.pop(segmentno)
+                    tokens = segment.split()
+                    phrase_index = random.choice(range(len(tokens)))
+                    choice = " ".join(
+                        tokens[phrase_index : min(len(tokens), phrase_index + args.len)]
+                    )
+                    for j in range(
+                        phrase_index, min(len(tokens), phrase_index + args.len)
+                    ):
+                        tokens.pop(phrase_index)
+                    if phrase_index > 0:
+                        words.append(" ".join(tokens[0:phrase_index]))
+                    if phrase_index + 1 < len(tokens):
+                        words.append(" ".join(tokens[phrase_index:]))
+                    choices[target.find(choice)] = choice
+
+                    # mask out with spaces
+                    target = target.replace(choice, " " * len(choice), 1)
+
+                for key in sorted(choices.keys()):
+                    add_constraint(choices[key])
+
+        print(source, *constraints, sep="\t")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--number", "-n", type=int, default=1, help="number of phrases")
+    parser.add_argument("--len", "-l", type=int, default=1, help="phrase length")
+    parser.add_argument(
+        "--add-sos", default=False, action="store_true", help="add <s> token"
+    )
+    parser.add_argument(
+        "--add-eos", default=False, action="store_true", help="add </s> token"
+    )
+    parser.add_argument("--seed", "-s", default=0, type=int)
+    args = parser.parse_args()
+
+    main(args)
diff --git a/fairseq-0.10.2/scripts/constraints/validate.py b/fairseq-0.10.2/scripts/constraints/validate.py
new file mode 100644
index 0000000000000000000000000000000000000000..d531ad9f39b1df42c98fe8f26ad61fe53a9ac0c5
--- /dev/null
+++ b/fairseq-0.10.2/scripts/constraints/validate.py
@@ -0,0 +1,34 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import sys
+
+
+"""Reads in a fairseq output file, and verifies that the constraints
+(C- lines) are present in the output (the first H- line). Assumes that
+constraints are listed prior to the first hypothesis.
+"""
+
+constraints = []
+found = 0
+total = 0
+for line in sys.stdin:
+    if line.startswith("C-"):
+        constraints.append(line.rstrip().split("\t")[1])
+    elif line.startswith("H-"):
+        text = line.split("\t")[2]
+
+        for constraint in constraints:
+            total += 1
+            if constraint in text:
+                found += 1
+            else:
+                print(f"No {constraint} in {text}", file=sys.stderr)
+
+        constraints = []
+
+print(f"Found {found} / {total} = {100 * found / total:.1f}%")
diff --git a/fairseq-0.10.2/scripts/convert_dictionary.lua b/fairseq-0.10.2/scripts/convert_dictionary.lua
new file mode 100644
index 0000000000000000000000000000000000000000..14ee8c997f642c8ff196617c2dcd0584037a60c4
--- /dev/null
+++ b/fairseq-0.10.2/scripts/convert_dictionary.lua
@@ -0,0 +1,34 @@
+-- Copyright (c) Facebook, Inc. and its affiliates.
+--
+-- This source code is licensed under the MIT license found in the
+-- LICENSE file in the root directory of this source tree.
+--
+-- Usage: convert_dictionary.lua <dict.th7>
+require 'fairseq'
+require 'torch'
+require 'paths'
+
+if #arg < 1 then
+   print('usage: convert_dictionary.lua <dict.th7>')
+   os.exit(1)
+end
+if not paths.filep(arg[1]) then
+   print('error: file does not exit: ' .. arg[1])
+   os.exit(1)
+end
+
+dict = torch.load(arg[1])
+dst = paths.basename(arg[1]):gsub('.th7', '.txt')
+assert(dst:match('.txt$'))
+
+f = io.open(dst, 'w')
+for idx, symbol in ipairs(dict.index_to_symbol) do
+  if idx > dict.cutoff then
+    break
+  end
+  f:write(symbol)
+  f:write(' ')
+  f:write(dict.index_to_freq[idx])
+  f:write('\n')
+end
+f:close()
diff --git a/fairseq-0.10.2/scripts/count_docs.py b/fairseq-0.10.2/scripts/count_docs.py
new file mode 100644
index 0000000000000000000000000000000000000000..58d85af85e91377a34dbd01f7674436152fd08e8
--- /dev/null
+++ b/fairseq-0.10.2/scripts/count_docs.py
@@ -0,0 +1,58 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Count the number of documents and average number of lines and tokens per
+document in a large file. Documents should be separated by a single empty line.
+"""
+
+import argparse
+import gzip
+import sys
+
+import numpy as np
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input")
+    parser.add_argument("--gzip", action="store_true")
+    args = parser.parse_args()
+
+    def gopen():
+        if args.gzip:
+            return gzip.open(args.input, "r")
+        else:
+            return open(args.input, "r", encoding="utf-8")
+
+    num_lines = []
+    num_toks = []
+    with gopen() as h:
+        num_docs = 1
+        num_lines_in_doc = 0
+        num_toks_in_doc = 0
+        for i, line in enumerate(h):
+            if len(line.strip()) == 0:  # empty line indicates new document
+                num_docs += 1
+                num_lines.append(num_lines_in_doc)
+                num_toks.append(num_toks_in_doc)
+                num_lines_in_doc = 0
+                num_toks_in_doc = 0
+            else:
+                num_lines_in_doc += 1
+                num_toks_in_doc += len(line.rstrip().split())
+            if i % 1000000 == 0:
+                print(i, file=sys.stderr, end="", flush=True)
+            elif i % 100000 == 0:
+                print(".", file=sys.stderr, end="", flush=True)
+        print(file=sys.stderr, flush=True)
+
+    print("found {} docs".format(num_docs))
+    print("average num lines per doc: {}".format(np.mean(num_lines)))
+    print("average num toks per doc: {}".format(np.mean(num_toks)))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/scripts/read_binarized.py b/fairseq-0.10.2/scripts/read_binarized.py
new file mode 100644
index 0000000000000000000000000000000000000000..a414095d03fb022a6753e816fc8bfd80e11db24d
--- /dev/null
+++ b/fairseq-0.10.2/scripts/read_binarized.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+
+from fairseq.data import Dictionary, data_utils, indexed_dataset
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="writes text from binarized file to stdout"
+    )
+    # fmt: off
+    parser.add_argument('--dataset-impl', help='dataset implementation',
+                        choices=indexed_dataset.get_available_dataset_impl())
+    parser.add_argument('--dict', metavar='FP', help='dictionary containing known words', default=None)
+    parser.add_argument('--input', metavar='FP', required=True, help='binarized file to read')
+    # fmt: on
+
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args = parser.parse_args()
+
+    dictionary = Dictionary.load(args.dict) if args.dict is not None else None
+    dataset = data_utils.load_indexed_dataset(
+        args.input,
+        dictionary,
+        dataset_impl=args.dataset_impl,
+        default="lazy",
+    )
+
+    for tensor_line in dataset:
+        if dictionary is None:
+            line = " ".join([str(int(x)) for x in tensor_line])
+        else:
+            line = dictionary.string(tensor_line)
+
+        print(line)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/scripts/rm_pt.py b/fairseq-0.10.2/scripts/rm_pt.py
new file mode 100644
index 0000000000000000000000000000000000000000..6cd063d21f0610fa7c42c2cfb2ee8af7c9c78677
--- /dev/null
+++ b/fairseq-0.10.2/scripts/rm_pt.py
@@ -0,0 +1,141 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+
+import argparse
+import os
+import re
+import shutil
+import sys
+
+
+pt_regexp = re.compile(r"checkpoint(\d+|_\d+_\d+|_[a-z]+)\.pt")
+pt_regexp_epoch_based = re.compile(r"checkpoint(\d+)\.pt")
+pt_regexp_update_based = re.compile(r"checkpoint_\d+_(\d+)\.pt")
+
+
+def parse_checkpoints(files):
+    entries = []
+    for f in files:
+        m = pt_regexp_epoch_based.fullmatch(f)
+        if m is not None:
+            entries.append((int(m.group(1)), m.group(0)))
+        else:
+            m = pt_regexp_update_based.fullmatch(f)
+            if m is not None:
+                entries.append((int(m.group(1)), m.group(0)))
+    return entries
+
+
+def last_n_checkpoints(files, n):
+    entries = parse_checkpoints(files)
+    return [x[1] for x in sorted(entries, reverse=True)[:n]]
+
+
+def every_n_checkpoints(files, n):
+    entries = parse_checkpoints(files)
+    return [x[1] for x in sorted(sorted(entries)[::-n])]
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description=(
+            "Recursively delete checkpoint files from `root_dir`, "
+            "but preserve checkpoint_best.pt and checkpoint_last.pt"
+        )
+    )
+    parser.add_argument("root_dirs", nargs="*")
+    parser.add_argument(
+        "--save-last", type=int, default=0, help="number of last checkpoints to save"
+    )
+    parser.add_argument(
+        "--save-every", type=int, default=0, help="interval of checkpoints to save"
+    )
+    parser.add_argument(
+        "--preserve-test",
+        action="store_true",
+        help="preserve checkpoints in dirs that start with test_ prefix (default: delete them)",
+    )
+    parser.add_argument(
+        "--delete-best", action="store_true", help="delete checkpoint_best.pt"
+    )
+    parser.add_argument(
+        "--delete-last", action="store_true", help="delete checkpoint_last.pt"
+    )
+    parser.add_argument(
+        "--no-dereference", action="store_true", help="don't dereference symlinks"
+    )
+    args = parser.parse_args()
+
+    files_to_desymlink = []
+    files_to_preserve = []
+    files_to_delete = []
+    for root_dir in args.root_dirs:
+        for root, _subdirs, files in os.walk(root_dir):
+            if args.save_last > 0:
+                to_save = last_n_checkpoints(files, args.save_last)
+            else:
+                to_save = []
+            if args.save_every > 0:
+                to_save += every_n_checkpoints(files, args.save_every)
+            for file in files:
+                if not pt_regexp.fullmatch(file):
+                    continue
+                full_path = os.path.join(root, file)
+                if (
+                    not os.path.basename(root).startswith("test_") or args.preserve_test
+                ) and (
+                    (file == "checkpoint_last.pt" and not args.delete_last)
+                    or (file == "checkpoint_best.pt" and not args.delete_best)
+                    or file in to_save
+                ):
+                    if os.path.islink(full_path) and not args.no_dereference:
+                        files_to_desymlink.append(full_path)
+                    else:
+                        files_to_preserve.append(full_path)
+                else:
+                    files_to_delete.append(full_path)
+
+    if len(files_to_desymlink) == 0 and len(files_to_delete) == 0:
+        print("Nothing to do.")
+        sys.exit(0)
+
+    files_to_desymlink = sorted(files_to_desymlink)
+    files_to_preserve = sorted(files_to_preserve)
+    files_to_delete = sorted(files_to_delete)
+
+    print("Operations to perform (in order):")
+    if len(files_to_desymlink) > 0:
+        for file in files_to_desymlink:
+            print(" - preserve (and dereference symlink): " + file)
+    if len(files_to_preserve) > 0:
+        for file in files_to_preserve:
+            print(" - preserve: " + file)
+    if len(files_to_delete) > 0:
+        for file in files_to_delete:
+            print(" - delete: " + file)
+    while True:
+        resp = input("Continue? (Y/N): ")
+        if resp.strip().lower() == "y":
+            break
+        elif resp.strip().lower() == "n":
+            sys.exit(0)
+
+    print("Executing...")
+    if len(files_to_desymlink) > 0:
+        for file in files_to_desymlink:
+            realpath = os.path.realpath(file)
+            print("rm " + file)
+            os.remove(file)
+            print("cp {} {}".format(realpath, file))
+            shutil.copyfile(realpath, file)
+    if len(files_to_delete) > 0:
+        for file in files_to_delete:
+            print("rm " + file)
+            os.remove(file)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/scripts/shard_docs.py b/fairseq-0.10.2/scripts/shard_docs.py
new file mode 100644
index 0000000000000000000000000000000000000000..97232c3c845ee01dc5ab627388934cc0f9588280
--- /dev/null
+++ b/fairseq-0.10.2/scripts/shard_docs.py
@@ -0,0 +1,54 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Split a large file into shards while respecting document boundaries. Documents
+should be separated by a single empty line.
+"""
+
+import argparse
+import contextlib
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input")
+    parser.add_argument("--num-shards", type=int)
+    args = parser.parse_args()
+
+    assert args.num_shards is not None and args.num_shards > 1
+
+    with open(args.input, "r", encoding="utf-8") as h:
+        with contextlib.ExitStack() as stack:
+            outputs = [
+                stack.enter_context(
+                    open(args.input + ".shard" + str(i), "w", encoding="utf-8")
+                )
+                for i in range(args.num_shards)
+            ]
+
+            doc = []
+            first_doc = [True] * args.num_shards
+
+            def output_doc(i):
+                if not first_doc[i]:
+                    outputs[i].write("\n")
+                first_doc[i] = False
+                for line in doc:
+                    outputs[i].write(line)
+                doc.clear()
+
+            num_docs = 0
+            for line in h:
+                if line.strip() == "":  # empty line indicates new document
+                    output_doc(num_docs % args.num_shards)
+                    num_docs += 1
+                else:
+                    doc.append(line)
+            output_doc(num_docs % args.num_shards)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/scripts/split_train_valid_docs.py b/fairseq-0.10.2/scripts/split_train_valid_docs.py
new file mode 100644
index 0000000000000000000000000000000000000000..ff159785284a13b44626b207d84430c592acaf8f
--- /dev/null
+++ b/fairseq-0.10.2/scripts/split_train_valid_docs.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python3
+# Copyright (c) Facebook, Inc. and its affiliates.
+#
+# This source code is licensed under the MIT license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Split a large file into a train and valid set while respecting document
+boundaries. Documents should be separated by a single empty line.
+"""
+
+import argparse
+import random
+import sys
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("input")
+    parser.add_argument("sample_output", help="train output file")
+    parser.add_argument("remainder_output", help="valid output file")
+    parser.add_argument("-k", type=int, help="remainder size")
+    parser.add_argument(
+        "--lines", action="store_true", help="split lines instead of docs"
+    )
+    args = parser.parse_args()
+
+    assert args.k is not None
+
+    sample = []
+    remainder = []
+    num_docs = [0]
+
+    def update_sample(doc):
+        if len(sample) < args.k:
+            sample.append(doc.copy())
+        else:
+            i = num_docs[0]
+            j = random.randrange(i + 1)
+            if j < args.k:
+                remainder.append(sample[j])
+                sample[j] = doc.copy()
+            else:
+                remainder.append(doc.copy())
+        num_docs[0] += 1
+        doc.clear()
+
+    with open(args.input, "r", encoding="utf-8") as h:
+        doc = []
+        for i, line in enumerate(h):
+            if line.strip() == "":  # empty line indicates new document
+                update_sample(doc)
+            else:
+                doc.append(line)
+            if args.lines:
+                update_sample(doc)
+            if i % 1000000 == 0:
+                print(i, file=sys.stderr, end="", flush=True)
+            elif i % 100000 == 0:
+                print(".", file=sys.stderr, end="", flush=True)
+        if len(doc) > 0:
+            update_sample(doc)
+    print(file=sys.stderr, flush=True)
+
+    assert len(sample) == args.k
+
+    with open(args.sample_output, "w", encoding="utf-8") as out:
+        first = True
+        for doc in sample:
+            if not first and not args.lines:
+                out.write("\n")
+            first = False
+            for line in doc:
+                out.write(line)
+
+    with open(args.remainder_output, "w", encoding="utf-8") as out:
+        first = True
+        for doc in remainder:
+            if not first and not args.lines:
+                out.write("\n")
+            first = False
+            for line in doc:
+                out.write(line)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/scripts/spm_decode.py b/fairseq-0.10.2/scripts/spm_decode.py
new file mode 100644
index 0000000000000000000000000000000000000000..1c18b1d2a7d7628b7aeb6fdb6c4ab5a096e9edf8
--- /dev/null
+++ b/fairseq-0.10.2/scripts/spm_decode.py
@@ -0,0 +1,53 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import argparse
+
+import sentencepiece as spm
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="sentencepiece model to use for decoding"
+    )
+    parser.add_argument("--input", required=True, help="input file to decode")
+    parser.add_argument("--input_format", choices=["piece", "id"], default="piece")
+    args = parser.parse_args()
+
+    sp = spm.SentencePieceProcessor()
+    sp.Load(args.model)
+
+    if args.input_format == "piece":
+
+        def decode(l):
+            return "".join(sp.DecodePieces(l))
+
+    elif args.input_format == "id":
+
+        def decode(l):
+            return "".join(sp.DecodeIds(l))
+
+    else:
+        raise NotImplementedError
+
+    def tok2int(tok):
+        # remap reference-side <unk> (represented as <<unk>>) to 0
+        return int(tok) if tok != "<<unk>>" else 0
+
+    with open(args.input, "r", encoding="utf-8") as h:
+        for line in h:
+            if args.input_format == "id":
+                print(decode(list(map(tok2int, line.rstrip().split()))))
+            elif args.input_format == "piece":
+                print(decode(line.rstrip().split()))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/fairseq-0.10.2/scripts/spm_encode.py b/fairseq-0.10.2/scripts/spm_encode.py
new file mode 100644
index 0000000000000000000000000000000000000000..83facfb3b184aff8b9cc3f0c82dd53668c63e57b
--- /dev/null
+++ b/fairseq-0.10.2/scripts/spm_encode.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+from __future__ import absolute_import, division, print_function, unicode_literals
+
+import argparse
+import contextlib
+import sys
+
+import sentencepiece as spm
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model", required=True, help="sentencepiece model to use for encoding"
+    )
+    parser.add_argument(
+        "--inputs", nargs="+", default=["-"], help="input files to filter/encode"
+    )
+    parser.add_argument(
+        "--outputs", nargs="+", default=["-"], help="path to save encoded outputs"
+    )
+    parser.add_argument("--output_format", choices=["piece", "id"], default="piece")
+    parser.add_argument(
+        "--min-len",
+        type=int,
+        metavar="N",
+        help="filter sentence pairs with fewer than N tokens",
+    )
+    parser.add_argument(
+        "--max-len",
+        type=int,
+        metavar="N",
+        help="filter sentence pairs with more than N tokens",
+    )
+    args = parser.parse_args()
+
+    assert len(args.inputs) == len(
+        args.outputs
+    ), "number of input and output paths should match"
+
+    sp = spm.SentencePieceProcessor()
+    sp.Load(args.model)
+
+    if args.output_format == "piece":
+
+        def encode(l):
+            return sp.EncodeAsPieces(l)
+
+    elif args.output_format == "id":
+
+        def encode(l):
+            return list(map(str, sp.EncodeAsIds(l)))
+
+    else:
+        raise NotImplementedError
+
+    if args.min_len is not None or args.max_len is not None:
+
+        def valid(line):
+            return (args.min_len is None or len(line) >= args.min_len) and (
+                args.max_len is None or len(line) <= args.max_len
+            )
+
+    else:
+
+        def valid(lines):
+            return True
+
+    with contextlib.ExitStack() as stack:
+        inputs = [
+            stack.enter_context(open(input, "r", encoding="utf-8"))
+            if input != "-"
+            else sys.stdin
+            for input in args.inputs
+        ]
+        outputs = [
+            stack.enter_context(open(output, "w", encoding="utf-8"))
+            if output != "-"
+            else sys.stdout
+            for output in args.outputs
+        ]
+
+        stats = {
+            "num_empty": 0,
+            "num_filtered": 0,
+        }
+
+        def encode_line(line):
+            line = line.strip()
+            if len(line) > 0:
+                line = encode(line)
+                if valid(line):
+                    return line
+                else:
+                    stats["num_filtered"] += 1
+            else:
+                stats["num_empty"] += 1
+            return None
+
+        for i, lines in enumerate(zip(*inputs), start=1):
+            enc_lines = list(map(encode_line, lines))
+            if not any(enc_line is None for enc_line in enc_lines):
+                for enc_line, output_h in zip(enc_lines, outputs):
+                    print(" ".join(enc_line), file=output_h)
+            if i % 10000 == 0:
+                print("processed {} lines".format(i), file=sys.stderr)
+
+        print("skipped {} empty lines".format(stats["num_empty"]), file=sys.stderr)
+        print("filtered {} lines".format(stats["num_filtered"]), file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/mosesdecoder/.gitignore b/mosesdecoder/.gitignore
new file mode 100644
index 0000000000000000000000000000000000000000..975e71dccbb765bca7e514868c3820417a3582e8
--- /dev/null
+++ b/mosesdecoder/.gitignore
@@ -0,0 +1,90 @@
+tools
+*.d
+*.pyc
+*.lo
+*.o
+*.so
+*.lo
+*.o
+*.la
+*.a
+*.swp
+*.save
+*.cmd
+*~
+*.gch
+dist*
+jam-files/bjam
+jam-files/engine/bootstrap
+jam-files/engine/bin.*
+lm/build_binary
+lm/query
+mert/evaluator
+mert/extractor
+mert/hgdecode
+mert/mert
+mert/megam_i686.opt
+mert/pro
+mert/kbmira
+misc/processLexicalTable
+misc/processPhraseTable
+misc/queryLexicalTable
+mira/mira
+mira/Makefile
+mira/Makefile.in
+misc/queryPhraseTable
+moses-chart-cmd/src/moses_chart
+moses-cmd/src/checkplf
+moses-cmd/src/lmbrgrid
+moses-cmd/src/moses
+regression-testing/moses-reg-test-data-*
+regression-testing/tests/mert.extractor-bin/FEATSTAT*
+regression-testing/tests/mert.extractor-bin/SCORESTAT*
+scripts/ems/biconcor/biconcor
+scripts/release-exclude
+scripts/training/cmert-0.5/mert
+scripts/training/compact-rule-table/tools/compactify
+scripts/training/eppex/counter
+scripts/training/eppex/eppex
+scripts/training/lexical-reordering/score
+scripts/training/memscore/memscore
+scripts/training/mbr/mbr
+scripts/training/phrase-extract/consolidate
+scripts/training/phrase-extract/consolidate-direct
+scripts/training/phrase-extract/consolidate-reverse
+scripts/training/phrase-extract/extract
+scripts/training/phrase-extract/extract-ghkm/tools/extract-ghkm
+scripts/training/phrase-extract/extract-lex
+scripts/training/phrase-extract/extract-rules
+scripts/training/phrase-extract/relax-parse
+scripts/training/phrase-extract/score
+scripts/training/phrase-extract/statistics
+scripts/training/symal/symal
+dist
+bin
+previous.sh
+contrib/other-builds/*.xcodeproj/project.xcworkspace/
+contrib/other-builds/*.xcodeproj/xcuserdata/
+*/*.xcodeproj/project.xcworkspace
+*/*.xcodeproj/xcuserdata
+
+mert/sentence-bleu
+mert/sentence-bleu-nbest
+._*
+.DS_Store
+*.pbxuser
+*.mode1v3
+
+*.exe
+build/
+nbproject/
+
+mingw/MosesGUI/MosesGUI.e4p
+mingw/MosesGUI/_eric4project/
+
+contrib/m4m/merge-sorted
+mert/hgdecode
+.bash_history*
+doxygen.conf
+doxy
+opt
diff --git a/mosesdecoder/.gitmodules b/mosesdecoder/.gitmodules
new file mode 100644
index 0000000000000000000000000000000000000000..90a9b30bad8e42ee8f22370b09bba232e5173f64
--- /dev/null
+++ b/mosesdecoder/.gitmodules
@@ -0,0 +1,9 @@
+[submodule "contrib/arrow-pipelines/python/pcl"]
+	path = contrib/arrow-pipelines/python/pcl
+	url = https://github.com/ianj-als/pcl.git
+[submodule "contrib/omtc/omtc"]
+	path = contrib/omtc/omtc
+	url = https://github.com/ianj-als/omtc.git
+[submodule "regtest"]
+	path = regtest
+	url = https://github.com/moses-smt/moses-regression-tests
diff --git a/mosesdecoder/Jamroot b/mosesdecoder/Jamroot
new file mode 100644
index 0000000000000000000000000000000000000000..91969fb9ccffd2ec7b16095088a119c5cfe1382c
--- /dev/null
+++ b/mosesdecoder/Jamroot
@@ -0,0 +1,345 @@
+#BUILDING MOSES
+
+#PACKAGES
+#Language models (optional):
+#--with-irstlm=/path/to/irstlm 
+#--with-srilm=/path/to/srilm See moses/LM/Jamfile for more options.
+#--with-maxent-srilm=true (requires a maxent-enabled version of SRILM to be specified via --with-srilm)
+#--with-nplm=/path/to/nplm
+#--with-randlm=/path/to/randlm
+#KenLM is always compiled.  
+#
+#--with-boost=/path/to/boost
+#If Boost is in a non-standard location, specify it here.  This directory is
+#expected to contain include and lib or lib64.  
+#
+#--with-xmlrpc-c=/path/to/xmlrpc-c for libxmlrpc-c (used by server)
+#Note that, like language models, this is the --prefix where the library was
+#installed, not some executable within the library.  
+#
+#--no-xmlrpc-c
+# Don't use xmlrpc-c library, even if it exists. Don't build moses server
+#
+#Compact phrase table and compact lexical reordering table
+#--with-cmph=/path/to/cmph
+#
+#Thread-caching malloc (if present, used for multi-threaded builds by default)
+#--without-tcmalloc does not compile with tcmalloc even if present
+#--full-tcmalloc links against the full version (useful for memory profiling)
+#
+#REGRESSION TESTING
+#--with-regtest=/path/to/moses-reg-test-data
+#
+#INSTALLATION
+#--prefix=/path/to/prefix sets the install prefix [default is source root].
+#--bindir=/path/to/prefix/bin sets the bin directory [PREFIX/bin]
+#--libdir=/path/to/prefix/lib sets the lib directory [PREFIX/lib]
+#--includedir=/path/to/prefix/include installs headers.  
+#  Does not install if missing.  No argument defaults to PREFIX/include .
+#--install-scripts=/path/to/scripts copies scripts into a directory.
+#  Does not install if missing.  No argument defaults to PREFIX/scripts .
+#--git appends the git revision to the prefix directory.
+#
+#
+#BUILD OPTIONS
+# By default, the build is multi-threaded, optimized, and statically linked.  
+# Pass these to change the build:
+#
+# threading=single|multi         controls threading (default multi)
+#
+# variant=release|debug|profile  builds optimized (default), for debug, or for
+#                                profiling
+#
+# link=static|shared             controls preferred linking (default static)
+# --static                       forces static linking (the default will fall
+#                                back to shared)
+#
+# debug-symbols=on|off           include or exclude (default) debugging 
+#                                information also known as -g
+# --notrace                      compiles without TRACE macros
+#
+# --enable-boost-pool            uses Boost pools for the memory SCFG tabgle
+#
+# --enable-mpi                   switch on mpi
+# --without-libsegfault          does not link with libSegFault
+#
+# --max-kenlm-order              maximum ngram order that kenlm can process (default 6)
+#
+# --max-factors                  maximum number of factors (default 4)
+#
+# --unlabelled-source            ignore source labels (redundant in hiero or string-to-tree system)
+#                                for better performance
+#CONTROLLING THE BUILD
+#-a to build from scratch
+#-j$NCPUS to compile in parallel
+#--clean to clean
+#--debug-build to build with Og. Only available with gcc 4.8+
+
+import os ;
+import option ;
+import modules ;
+import path ;
+path-constant TOP : . ;
+
+include $(TOP)/jam-files/sanity.jam ;
+
+home = [ os.environ "HOME" ] ;
+if [ path.exists $(home)/moses-environment.jam ] 
+{ 
+  # for those of use who don't like typing in command line bjam options all day long
+  include $(home)/moses-environment.jam ; 
+}
+include $(TOP)/jam-files/check-environment.jam ; # get resource locations 
+                                                 # from environment variables
+include $(TOP)/jam-files/xmlrpc-c.jam ; # xmlrpc-c stuff for the server
+# include $(TOP)/jam-files/curlpp.jam ;   # curlpp stuff for bias lookup (MMT only)
+
+# exit "done" : 0 ;
+
+max-order = [ option.get "max-kenlm-order" : 6 : 6 ] ;
+if ! [ option.get "max-kenlm-order" ] 
+{
+  # some classes in Moses pull in header files from KenLM, so this needs to be 
+  # defined here, not in moses/lm/Jamfile 
+  option.set "max-kenlm-order" : 6 ;
+  requirements += <define>KENLM_MAX_ORDER=$(max-order) ;
+}
+# exit "all done" : 0 ; 
+
+boost 104400 ;
+external-lib z ;
+
+#lib dl : : <runtime-link>static:<link>static <runtime-link>shared:<link>shared ;
+#requirements += <library>dl ;
+requirements += <cxxflags>-std=c++0x ;
+
+# Allow moses to report the git commit hash of the version used for compilation
+moses_githash = [ _shell "git describe --dirty" ] ;
+requirements += <define>MOSES_VERSION_ID=\\\"$(moses_githash)\\\" ;
+
+if ! [ option.get "without-tcmalloc" : : "yes" ] && [ test_library "tcmalloc_minimal" ] {
+  if [ option.get "full-tcmalloc" : : "yes" ] {
+    external-lib unwind ;
+    external-lib tcmalloc_and_profiler : : unwind ;
+    requirements += <library>tcmalloc_and_profiler <library>unwind <cflags>-fno-omit-frame-pointer <cxxflags>-fno-omit-frame-pointer ;
+  } else {
+    external-lib tcmalloc_minimal ;
+    requirements += <threading>multi:<library>tcmalloc_minimal ;
+  }
+} else {
+  echo "Tip: install tcmalloc for faster threading.  See BUILD-INSTRUCTIONS.txt for more information." ;
+}
+
+if [ option.get "filter-warnings" : : "yes" ] {
+  # given the low coding standards in Moses, we may want to filter out
+  # warnings about poor coding practice that no-one is ever going to fix 
+  # anyway ...
+  requirements += <cxxflags>-Wno-deprecated ;
+  requirements += <cxxflags>-Wno-reorder ;
+  requirements += <cxxflags>-Wno-sign-compare ;
+  requirements += <cxxflags>-Wno-unused-but-set-variable ;
+  requirements += <cxxflags>-Wno-unused-result ;
+  requirements += <cxxflags>-Wno-unused-variable ;
+  requirements += <cxxflags>-Wno-comment ;
+  requirements += <cxxflags>-Wno-strict-aliasing ;
+  requirements += <cxxflags>-Wno-overloaded-virtual ;
+}
+
+if [ option.get "debug-build" : : "yes" ] {
+   requirements += <cxxflags>-Og ;
+   echo "Building with -Og to enable easier profiling and debugging. Only available on gcc 4.8+." ;
+}
+
+if [ option.get "with-address-sanitizer" : : "yes" ] {
+  requirements += <cxxflags>-fsanitize=address ;
+  requirements += <cxxflags>-fno-omit-frame-pointer ;
+  requirements += <linkflags>-fsanitize=address ;
+  echo "Building with AddressSanitizer to enable debugging of memory errors. Only available on gcc 4.8+." ;
+}
+
+if [ option.get "enable-mpi" : : "yes" ] {
+  import mpi ;
+  using mpi ;
+  external-lib boost_mpi ;
+  external-lib boost_serialization ;
+  requirements += <define>MPI_ENABLE ;
+  requirements += <library>mpi ;
+  requirements += <library>boost_mpi ;
+  requirements += <library>boost_serialization ;
+}
+
+mmt = [ option.get "mmt" ] ; 
+if $(mmt) {
+  requirements += <define>MMT ;
+  requirements += <include>$(mmt) ; 
+  mmt_githash = [ _shell "cd $(mmt) && git describe --dirty" ] ;
+  requirements += <define>MMT_VERSION_ID=\\\"$(mmt_githash)\\\" ;
+}
+
+requirements += [ option.get "notrace" : <define>TRACE_ENABLE=1 ] ;
+requirements += [ option.get "enable-boost-pool" : : <define>USE_BOOST_POOL ] ;
+requirements += [ option.get "with-mm" : : <define>PT_UG ] ;
+requirements += [ option.get "with-mm" : : <define>MAX_NUM_FACTORS=4 ] ;
+requirements += [ option.get "unlabelled-source" : : <define>UNLABELLED_SOURCE ] ;
+
+if [ option.get "with-oxlm" ] {
+  external-lib gomp ;
+  requirements += <library>boost_serialization ;
+  requirements += <library>gomp ;
+}
+
+if [ option.get "with-cmph" : : "yes" ] {
+  requirements += <define>HAVE_CMPH ;
+}
+
+if [ option.get "with-icu" : : "yes" ]
+{
+  external-lib icuuc ;
+  external-lib icuio ;
+  external-lib icui18n ;
+  requirements += <library>icuuc/<link>shared ;
+  requirements += <library>icuio/<link>shared ;
+  requirements += <library>icui18n/<link>shared ;
+  requirements += <cxxflags>-fPIC ;
+  requirements += <address-model>64 ;
+#  requirements += <runtime-link>shared ;
+}
+
+# for probing pt
+external-lib boost_serialization ;
+requirements += <library>boost_serialization/<runtime-link>static ;
+
+if [ option.get "with-vw" ] {
+  requirements += <define>HAVE_VW ;
+}
+
+project : default-build
+  <threading>multi
+  <warnings>on
+  <debug-symbols>off
+  <variant>release
+  <link>static
+  ;
+
+#Apparently OS X likes to link against iconv for fgetsUTF8.
+lib iconv ;
+requirements += <os>MACOSX:<library>iconv ;
+
+project : requirements 
+  <threading>multi:<define>WITH_THREADS
+  <threading>multi:<library>boost_thread
+  <library>boost_system
+  <library>boost_program_options
+  <define>_FILE_OFFSET_BITS=64 <define>_LARGE_FILES
+  $(requirements)
+  <include>.
+  ;
+
+
+#Add directories here if you want their incidental targets too (i.e. tests).
+build-projects lm util phrase-extract phrase-extract/syntax-common search moses moses/LM mert moses-cmd scripts regression-testing ;
+# contrib/mira
+
+if [ option.get "with-mm-extras" : : "yes" ]
+{
+  alias mm-extras :
+  moses/TranslationModel/UG//bitext-find 
+  moses/TranslationModel/UG//ptable-describe-features 
+  moses/TranslationModel/UG//count-ptable-features 
+  moses/TranslationModel/UG//ptable-sigtest-filter 
+  moses/TranslationModel/UG//ptable-lookup 
+  moses/TranslationModel/UG//ptable-lookup-corpus 
+  moses/TranslationModel/UG//check-coverage 
+  moses/TranslationModel/UG/mm//mtt-demo1 
+  moses/TranslationModel/UG/mm//mtt-dump 
+  moses/TranslationModel/UG/mm//mam2symal 
+  moses/TranslationModel/UG/mm//mam_verify 
+  moses/TranslationModel/UG/mm//mmlex-lookup 
+  moses/TranslationModel/UG/mm//mtt-count-words 
+  moses/TranslationModel/UG/mm//calc-coverage 
+  moses/TranslationModel/UG//try-align 
+  ;
+} 
+else
+{
+  alias mm-extras ;
+}
+
+if [ option.get "with-mm" : : "yes" ]
+{
+ alias mm :  
+  moses/TranslationModel/UG/mm//mtt-build 
+  moses/TranslationModel/UG/mm//symal2mam 
+  moses/TranslationModel/UG/mm//mmlex-build 
+  ;
+}
+else
+{
+ alias mm ; 
+}
+
+if [ option.get "with-rephraser" : : "yes" ]
+{
+  alias rephraser :
+  contrib/rephraser//paraphrase
+  ;
+}
+else
+{
+  alias rephraser ;
+}
+
+alias programs : 
+lm//programs 
+moses-cmd//programs 
+OnDiskPt//CreateOnDiskPt 
+OnDiskPt//queryOnDiskPt 
+mert//programs 
+misc//programs 
+symal 
+phrase-extract 
+phrase-extract//lexical-reordering 
+phrase-extract//extract-ghkm 
+phrase-extract//pcfg-extract 
+phrase-extract//pcfg-score 
+phrase-extract//extract-mixed-syntax 
+phrase-extract//score-stsg
+phrase-extract//filter-rule-table
+phrase-extract//postprocess-egret-forests
+biconcor 
+# contrib/mira//mira 
+contrib/server//mosesserver 
+mm
+mm-extras
+rephraser
+contrib/c++tokenizer//tokenizer
+contrib/expected-bleu-training//train-expected-bleu
+contrib/expected-bleu-training//prepare-expected-bleu-training
+
+probingpt//programs
+moses2//programs
+;
+
+
+install-bin-libs programs ;
+install-headers headers-base : [ path.glob-tree biconcor contrib lm mert misc moses-cmd OnDiskPt phrase-extract symal util : *.hh *.h ] : . ;
+install-headers headers-moses : moses//headers-to-install : moses ;
+
+alias install : prefix-bin prefix-lib headers-base headers-moses ;
+
+if ! [ option.get "includedir" : : $(prefix)/include ] {
+  explicit install headers-base headers-moses ;
+}
+
+if [ path.exists $(TOP)/dist ] && $(prefix) != dist {
+  echo "You have a $(TOP)/dist directory, but the build system now places files directly in the root i.e. $(TOP)/bin ." ;
+  echo "To disable this message, delete $(TOP)/dist ." ;
+  echo ;
+}
+
+#local temp = [ _shell "bash source ./s.sh" ] ;
+local temp = [ _shell "mkdir -p $(PREFIX)/bin" ] ;
+local temp = [ _shell "rm -f $(PREFIX)/bin/moses_chart" ] ;
+local temp = [ _shell "cd $(PREFIX)/bin && ln -sf moses moses_chart" ] ;
+local temp = [ _shell "cd $(PREFIX)/bin && ln -sf CreateProbingPT CreateProbingPT2" ] ;
+
diff --git a/mosesdecoder/README b/mosesdecoder/README
new file mode 100644
index 0000000000000000000000000000000000000000..644ce6c1cb973121a2721a596e237e2381286cbf
--- /dev/null
+++ b/mosesdecoder/README
@@ -0,0 +1,19 @@
+Instructions for building and installing Moses are online:
+   http://www.statmt.org/moses/?n=Development.GetStarted
+If you have g++ and Boost installed, and you want the default compilation with most of things you need, then run:
+   ./bjam -j4
+
+Questions should be directed to the mailing list (don't forget to register before sending emails):
+   http://mailman.mit.edu/mailman/listinfo/moses-support
+   https://github.com/moses-smt/mosesdecoder/compare/master...hieu2
+
+Some of the code is not originally part of Moses, but is periodically copied
+into the source tree from elsewhere:
+
+ * "bjam-files" is taken from Boost.
+ * "util" and "lm" are taken from KenLM: https://github.com/kpu/kenlm
+
+=====================================================
+Running on Ubuntu 22.04 (March 2025)
+sudo apt install libcmph-dev libxmlrpc-c++8-dev
+./bjam -j11 --with-cmph=/usr --with-xmlrpc-c=/usr -a
diff --git a/mosesdecoder/cgmanifest.json b/mosesdecoder/cgmanifest.json
new file mode 100644
index 0000000000000000000000000000000000000000..36cc60cff27374f4a38769f6937fc8e3ad0cc290
--- /dev/null
+++ b/mosesdecoder/cgmanifest.json
@@ -0,0 +1,33 @@
+{
+    "Registrations":[
+        {
+            "component": { 
+             "type": "git", 
+             "git": { 
+               "repositoryUrl": "https://github.com/moses-smt/mosesdecoder", 
+               "commitHash": "78ca5f3cc5aa671a8a5d36c56452e217e6f00828" 
+               }
+             }
+          },
+          {
+            "component": { 
+             "type": "git", 
+             "git": { 
+               "repositoryUrl": "https://git.code.sf.net/p/cmph/git", 
+               "commitHash": "a250982ade093f4eed0552bbdd22dd7b0432007f" 
+               }
+             }
+          },          
+        {
+            "Component": {
+                "Type": "other",
+                "Other": {
+                    "Name": "xml-rpc-c",
+                    "Version": "1.51.06",
+                    "DownloadUrl": "https://sourceforge.net/projects/xmlrpc-c/files/Xmlrpc-c%20Super%20Stable/1.51.06/xmlrpc-c-1.51.06.tgz"
+                }
+            }
+        },        
+    ]
+}
+
diff --git a/mosesdecoder/env-check.yml b/mosesdecoder/env-check.yml
new file mode 100644
index 0000000000000000000000000000000000000000..9292648fa7b83b05a2cf312c57dd2db0516bb5e5
--- /dev/null
+++ b/mosesdecoder/env-check.yml
@@ -0,0 +1,34 @@
+# Starter pipeline
+# Start with a minimal pipeline that you can customize to build and deploy your code.
+# Add steps that build, run tests, deploy, and more:
+# https://aka.ms/yaml
+
+trigger:
+- master
+
+pool:
+  #vmImage: 'ubuntu-latest'
+  vmImage: 'ubuntu-16.04'
+
+steps:
+
+- script: |
+    echo Printing some environment information
+    echo HOME: $HOME
+    echo
+    echo UBUNTU VERSION: 
+    cat /etc/lsb-release
+    echo
+    echo CPU INFO
+    cat /proc/cpuinfo
+    echo 
+    echo MEM INFO
+    cat /proc/meminfo
+    echo
+    echo DISK INFO
+    df -h
+    echo 
+    echo PWD: $PWD
+    echo
+    ls
+  displayName: 'Printing some environment information'
\ No newline at end of file