YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Training & Inference

Data Source

The dataset includes Small Training (100k), Large Training (10k), Validation (500), and Test (200) sets in .jsonl format.

  • Download Link: Baidu Netdisk
  • Note: You can preprocess the data by preprocess.py or directly use "data/processed_nltk_100k"

Download checkpoints and respective config.yaml, and put them under the directory "runs/train"

Preprocess the data

python preprocess.py -c config.yaml

Evaluation

Evaluate the model using Greedy decoding or beam search. Performance is measured using BLEU-4.

Evaluate transformer

python evaluate_transformer.py -c runs/train/transformer/MHA/config.yaml --ckpt runs/train/transformer/MHA/best_model.pt --save_path runs/evaluate --eval_method beam  

Evaluate rnn

python evaluate_rnn.py -c runs/train/rnn/config.yaml --ckpt runs/train/rnn/best_model.pt  --save_path runs/evaluate --eval_method beam  

Training

Training Transformer (MHA, MQA, GQA)

python train_tranformer.py -c runs/trian/transformer/MHA/config.yaml

Training RNN (MHA, MQA, GQA)

python train_tranformer.py -c runs/trian/transformer/MHA/config.yaml

Main Results

Table 1: Performance of Transformer Variants.

Model Variant Decoding Strategy BLEU Score
Transformer (MHA) Greedy Search 13.61
Beam Search 14.56
Transformer (MQA) Greedy Search 11.00
Beam Search 12.10
Transformer (GQA) Greedy Search 9.57
Beam Search 10.80

Table 2: Performance of RNN Variants

Alignment Function Decoding Strategy BLEU Score
Dot Product (dot) Greedy Search 8.95
Beam Search 9.44
Multiplicative (general) Greedy Search 9.20
Beam Search 9.88
Additive (concat) Greedy Search 10.44
Beam Search 10.09
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support