soughtlin commited on
Commit
3d9ab29
·
verified ·
1 Parent(s): a0994a7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Training & Inference
2
+
3
+ ### Data Source
4
+
5
+ The dataset includes Small Training (100k), Large Training (10k), Validation (500), and Test (200) sets in `.jsonl` format.
6
+
7
+ * **Download Link:** [Baidu Netdisk](https://pan.baidu.com/s/1TuaGjNvTESt9ZdEQy1BogA?pwd=u9i2).
8
+ * **Note:** If resources are limited, you may use 10k samples from the Small Training Set, though using the Large Training Set is encouraged.
9
+
10
+ Download checkpoints and respective config.yaml, and put them under the directory "runs/train"
11
+
12
+ * **Download Link:**: https://huggingface.co/soughtlin/CN_EN_Translation_Model
13
+
14
+ Preprocess the data
15
+
16
+ ```bash
17
+ python preprocess.py -c config.yaml
18
+ ```
19
+
20
+ ### Evaluation
21
+
22
+ Evaluate the model using **Greedy decoding** or **beam search**. Performance is measured using **BLEU-4**.
23
+
24
+ Evaluate transformer
25
+
26
+ ```bash
27
+ python evaluate_transformer.py -c runs/train/transformer/MHA/config.yaml --ckpt runs/train/transformer/MHA/best_model.pt --save_path runs/evaluate --eval_method beam
28
+ ```
29
+
30
+ Evaluate rnn
31
+
32
+ ```bash
33
+ python evaluate_rnn.py -c runs/train/rnn/config.yaml --ckpt runs/train/rnn/best_model.pt --save_path runs/evaluate --eval_method beam
34
+ ```
35
+
36
+ ### Training
37
+
38
+ Training Transformer (MHA, MQA, GQA)
39
+
40
+ ```bash
41
+ python train_tranformer.py -c runs/trian/transformer/MHA/config.yaml
42
+ ```
43
+
44
+ Training RNN (MHA, MQA, GQA)
45
+
46
+ ```bash
47
+ python train_tranformer.py -c runs/trian/transformer/MHA/config.yaml
48
+ ```
49
+
50
+
51
+ ### Main Results
52
+
53
+ **Table 1: Performance of Transformer Variants.**
54
+
55
+ | Model Variant | Decoding Strategy | BLEU Score |
56
+ | --------------------- | ----------------- | ---------- |
57
+ | **Transformer (MHA)** | Greedy Search | 13.61 |
58
+ | | Beam Search | **14.56** |
59
+ | Transformer (MQA) | Greedy Search | 11.00 |
60
+ | | Beam Search | 12.10 |
61
+ | Transformer (GQA) | Greedy Search | 9.57 |
62
+ | | Beam Search | 10.80 |
63
+
64
+
65
+ **Table 2: Performance of RNN Variants**
66
+
67
+ | Alignment Function | Decoding Strategy | BLEU Score |
68
+ | ------------------------ | ----------------- | ---------- |
69
+ | **Dot Product (dot)** | Greedy Search | 8.95 |
70
+ | | Beam Search | 9.44 |
71
+ | Multiplicative (general) | Greedy Search | 9.20 |
72
+ | | Beam Search | 9.88 |
73
+ | Additive (concat) | Greedy Search | 10.44 |
74
+ | | Beam Search | 10.09 |
75
+
76
+
77
+