aimabai commited on
Commit
0fd78cb
·
verified ·
1 Parent(s): 9f9a082

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -0
README.md ADDED
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Bilingual Translation Evaluation Script (EN → KK)
2
+
3
+ This repository provides an evaluation pipeline for English-to-Kazakh/Russian-to-Kazakh (and vice versa) translation models based on the `Gemma3ForCausalLM` architecture from Hugging Face Transformers.
4
+
5
+ ## 🚀 Overview
6
+
7
+ The script:
8
+ - Loads a fine-tuned model and tokenizer
9
+ - Performs inference on a FLORES-style test set (`.jsonl`)
10
+ - Computes BLEU score using NLTK
11
+ - Saves predictions and evaluation results into a JSON file
12
+
13
+ ## 📁 File Structure
14
+
15
+ ```
16
+ .
17
+ ├── eval_sync_KKEN.py # Main evaluation script (this file)
18
+ ├── eval_sync_KKEN_data_en_to_kk.json # Output file (generated)
19
+ ```
20
+
21
+ ## 📥 Input Format
22
+
23
+ The test file should be a `.jsonl` file where each line is a JSON object with the following fields:
24
+
25
+ ```json
26
+ {
27
+ "system": "System prompt text",
28
+ "user": "<src=en><tgt=kk> Some English input",
29
+ "assistant": "Expected Kazakh translation"
30
+ }
31
+ ```
32
+
33
+ ## 📤 Output
34
+
35
+ The script will produce a file named like `eval_sync_KKEN_data_en_to_kk.json`, which contains:
36
+ - Model path
37
+ - Final BLEU score
38
+ - A list of examples with system prompt, cleaned user input, model prediction (hypothesis), and reference translation
39
+
40
+ Example output entry:
41
+ ```json
42
+ {
43
+ "model": "/path/to/model",
44
+ "bleu": 27.53,
45
+ "examples": [
46
+ {
47
+ "system": "Translate this.",
48
+ "user": "Hello, how are you?",
49
+ "reference": "Сәлем, қалайсың?",
50
+ "hypothesis": "Сәлеметсіз бе, жағдайыңыз қалай?"
51
+ }
52
+ ]
53
+ }
54
+ ```
55
+
56
+ ## ⚙️ Configuration
57
+
58
+ Modify these lines at the top of the script as needed:
59
+
60
+ ```python
61
+ SRC_LANG = "en"
62
+ TGT_LANG = "kk"
63
+ MODEL_PATH = "/path/to/your/model"
64
+ TEST_FILE = "/path/to/test_file.jsonl"
65
+ MAX_NEW_TOKS = 64
66
+ DEVICE = "cuda" # or "cpu"
67
+ ```
68
+
69
+ To specify GPU devices:
70
+ ```bash
71
+ export CUDA_VISIBLE_DEVICES=2,3,4,5
72
+ ```
73
+
74
+ ## 📦 Requirements
75
+
76
+ Install required packages:
77
+
78
+ ```bash
79
+ pip install transformers torch nltk tqdm
80
+ ```
81
+
82
+ Also, download NLTK data (if not yet):
83
+
84
+ ```python
85
+ import nltk
86
+ nltk.download('punkt')
87
+ ```
88
+
89
+ ## ▶️ Run the Script
90
+
91
+ ```bash
92
+ python eval_sync_KKEN.py
93
+ ```
94
+
95
+ This will:
96
+ - Load the model
97
+ - Run translation inference
98
+ - Compute BLEU score
99
+ - Save evaluation results to a `.json` file
100
+
101
+ ## 📝 Notes
102
+
103
+ - Make sure your model and tokenizer directory follows Hugging Face format.
104
+ - The script uses `<start_of_turn>` and `<end_of_turn>` tokens to structure prompts for inference.
105
+ - Input strings are automatically cleaned of tags like `<src=..><tgt=..>` before generating output.
106
+
107
+ ## 📧 Contact
108
+
109
+ For questions or feedback, please contact [Your Name or GitHub Profile].