# Bilingual Translation Evaluation Script (EN → KK) This repository provides an evaluation pipeline for English-to-Kazakh/Russian-to-Kazakh (and vice versa) translation models based on the `Gemma3ForCausalLM` architecture from Hugging Face Transformers. ## 🚀 Overview The script: - Loads a fine-tuned model and tokenizer - Performs inference on a FLORES-style test set (`.jsonl`) - Computes BLEU score using NLTK - Saves predictions and evaluation results into a JSON file ## ⚙️ Configuration Modify these lines at the top of the script as needed: ```python SRC_LANG = "en" TGT_LANG = "kk" MODEL_PATH = "/path/to/your/model" TEST_FILE = "/path/to/test_file.jsonl" OUTPUT_JSON = "/path/to/output_file.jsonl" MAX_NEW_TOKS = 64 DEVICE = "cuda" # or "cpu" ``` To specify GPU devices: ```bash export CUDA_VISIBLE_DEVICES=2,3,4,5 ``` ## ▶️ Run the Script ```bash python eval_blue.py ```