File size: 2,437 Bytes
3d9ab29 9b43e02 3d9ab29 773c1a7 3d9ab29 773c1a7 3d9ab29 773c1a7 3d9ab29 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
## Training & Inference
### Data Source
The dataset includes Small Training (100k), Large Training (10k), Validation (500), and Test (200) sets in `.jsonl` format.
* **Download Link:** [Baidu Netdisk](https://pan.baidu.com/s/1TuaGjNvTESt9ZdEQy1BogA?pwd=u9i2)
* **Note:** You can preprocess the data by preprocess.py or directly use "data/processed_nltk_100k"
Download checkpoints and respective config.yaml, and put them under the directory "runs/train"
* **Download Link:**: https://huggingface.co/soughtlin/CN_EN_Translation_Model
Preprocess the data
```bash
python preprocess.py -c config.yaml
```
### Evaluation
Evaluate the model using **Greedy decoding** or **beam search**. Performance is measured using **BLEU-4**.
Evaluate transformer
```bash
python evaluate_transformer.py -c runs/train/transformer/MHA/config.yaml --ckpt runs/train/transformer/MHA/best_model.pt --save_path runs/evaluate --eval_method beam
```
Evaluate rnn
```bash
python evaluate_rnn.py -c runs/train/rnn/config.yaml --ckpt runs/train/rnn/best_model.pt --save_path runs/evaluate --eval_method beam
```
### Training
Training Transformer (MHA, MQA, GQA)
```bash
python train_tranformer.py -c runs/trian/transformer/MHA/config.yaml
```
Training RNN (MHA, MQA, GQA)
```bash
python train_tranformer.py -c runs/trian/transformer/MHA/config.yaml
```
### Main Results
**Table 1: Performance of Transformer Variants.**
| Model Variant | Decoding Strategy | BLEU Score |
| --------------------- | ----------------- | ---------- |
| Transformer (MHA) | Greedy Search | 13.61 |
| | Beam Search | **14.56** |
| Transformer (MQA) | Greedy Search | 11.00 |
| | Beam Search | 12.10 |
| Transformer (GQA) | Greedy Search | 9.57 |
| | Beam Search | 10.80 |
**Table 2: Performance of RNN Variants**
| Alignment Function | Decoding Strategy | BLEU Score |
| ------------------------ | ----------------- | ---------- |
| Dot Product (dot) | Greedy Search | 8.95 |
| | Beam Search | 9.44 |
| Multiplicative (general) | Greedy Search | 9.20 |
| | Beam Search | 9.88 |
| Additive (concat) | Greedy Search | **10.44** |
| | Beam Search | 10.09 |
|