soughtlin's picture
Update README.md
4cc9311 verified
## Training & Inference
### Data Source
The dataset includes Small Training (100k), Large Training (10k), Validation (500), and Test (200) sets in `.jsonl` format.
* **Download Link:** [Baidu Netdisk](https://pan.baidu.com/s/1TuaGjNvTESt9ZdEQy1BogA?pwd=u9i2)
* **Note:** You can preprocess the data by preprocess.py or directly use "data/processed_nltk_100k"
Download checkpoints and respective config.yaml, and put them under the directory "runs/train"
* **Download Link:**: https://huggingface.co/soughtlin/CN_EN_Translation_Model
Preprocess the data
```bash
python preprocess.py -c config.yaml
```
### Evaluation
Evaluate the model using **Greedy decoding** or **beam search**. Performance is measured using **BLEU-4**.
Evaluate transformer
```bash
python evaluate_transformer.py -c runs/train/transformer/MHA/config.yaml --ckpt runs/train/transformer/MHA/best_model.pt --save_path runs/evaluate --eval_method beam
```
Evaluate rnn
```bash
python evaluate_rnn.py -c runs/train/rnn/config.yaml --ckpt runs/train/rnn/best_model.pt --save_path runs/evaluate --eval_method beam
```
### Training
Training Transformer (MHA, MQA, GQA)
```bash
python train_tranformer.py -c runs/trian/transformer/MHA/config.yaml
```
Training RNN (MHA, MQA, GQA)
```bash
python train_tranformer.py -c runs/trian/transformer/MHA/config.yaml
```
### Main Results
**Table 1: Performance of Transformer Variants.**
| Model Variant | Decoding Strategy | BLEU Score |
| --------------------- | ----------------- | ---------- |
| Transformer (MHA) | Greedy Search | 13.61 |
| | Beam Search | **14.56** |
| Transformer (MQA) | Greedy Search | 11.00 |
| | Beam Search | 12.10 |
| Transformer (GQA) | Greedy Search | 9.57 |
| | Beam Search | 10.80 |
**Table 2: Performance of RNN Variants**
| Alignment Function | Decoding Strategy | BLEU Score |
| ------------------------ | ----------------- | ---------- |
| Dot Product (dot) | Greedy Search | 8.95 |
| | Beam Search | 9.44 |
| Multiplicative (general) | Greedy Search | 9.20 |
| | Beam Search | 9.88 |
| Additive (concat) | Greedy Search | **10.44** |
| | Beam Search | 10.09 |