| # Bilingual Translation Evaluation Script (EN → KK) | |
| This repository provides an evaluation pipeline for English-to-Kazakh/Russian-to-Kazakh (and vice versa) translation models based on the `Gemma3ForCausalLM` architecture from Hugging Face Transformers. | |
| ## 🚀 Overview | |
| The script: | |
| - Loads a fine-tuned model and tokenizer | |
| - Performs inference on a FLORES-style test set (`.jsonl`) | |
| - Computes BLEU score using NLTK | |
| - Saves predictions and evaluation results into a JSON file | |
| ## ⚙️ Configuration | |
| Modify these lines at the top of the script as needed: | |
| ```python | |
| SRC_LANG = "en" | |
| TGT_LANG = "kk" | |
| MODEL_PATH = "/path/to/your/model" | |
| TEST_FILE = "/path/to/test_file.jsonl" | |
| OUTPUT_JSON = "/path/to/output_file.jsonl" | |
| MAX_NEW_TOKS = 64 | |
| DEVICE = "cuda" # or "cpu" | |
| ``` | |
| To specify GPU devices: | |
| ```bash | |
| export CUDA_VISIBLE_DEVICES=2,3,4,5 | |
| ``` | |
| ## ▶️ Run the Script | |
| ```bash | |
| python eval_blue.py | |
| ``` | |