MaxLSB commited on
Commit
7270c49
·
verified ·
1 Parent(s): c78f916

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -26
README.md CHANGED
@@ -30,37 +30,25 @@ Luth was trained using full fine-tuning on the Luth-SFT dataset with [Axolotl](h
30
 
31
  We used LightEval for evaluation, with custom tasks for the French benchmarks. The models were evaluated with a `temperature=0`.
32
 
33
- ### Evaluation Visualizations
34
-
35
- **French Evaluation:**
36
-
37
- ![French Evaluation](media/french_evaluation.png)
38
-
39
- **English Evaluation:**
40
-
41
- ![English Evaluation](media/english_evaluation.png)
42
-
43
  ### French Benchmark Scores
44
 
45
- | Benchmark | Qwen3-1.7B | SmolLM2-1.7B-Instruct | Qwen2.5-1.5B-Instruct | Luth-1.7B-Instruct |
46
- |-------------------|------------------|-----------------------|-----------------------|----------------------|
47
- | ifeval-fr | 54.53 | 31.24 | 32.90 | <u>57.67</u> |
48
- | gpqa-diamond-fr | 26.90 | 21.83 | 28.93 | <u>38.58</u> |
49
- | mmlu-fr | 28.46 | 33.73 | 46.25 | <u>49.66</u> |
50
- | math-500-fr | 60.80 | 11.20 | 32.20 | <u>64.00</u> |
51
- | arc-chall-fr | 33.28 | 28.57 | 32.68 | <u>35.16</u> |
52
- | hellaswag-fr | 24.86 | <u>49.58</u> | 34.34 | 31.93 |
53
 
54
  ### English Benchmark Scores
55
 
56
- | Benchmark | Qwen3-1.7B | SmolLM2-1.7B-Instruct | Qwen2.5-1.5B-Instruct | Luth-1.7B-Instruct |
57
- |-------------------|------------------|-----------------------|-----------------------|----------------------|
58
- | ifeval-en | <u>68.39</u> | 48.24 | 39.93 | 65.80 |
59
- | gpqa-diamond-en | <u>31.82</u> | 24.75 | 30.30 | 31.82 |
60
- | mmlu-en | 52.74 | 50.27 | 59.81 | <u>60.19</u> |
61
- | math-500-en | 69.20 | 22.40 | 56.00 | <u>70.00</u> |
62
- | arc-chall-en | 36.09 | 42.32 | 41.04 | <u>42.24</u> |
63
- | hellaswag-en | 46.96 | <u>66.94</u> | 64.48 | 58.55 |
64
 
65
  ## Code Example
66
 
 
30
 
31
  We used LightEval for evaluation, with custom tasks for the French benchmarks. The models were evaluated with a `temperature=0`.
32
 
 
 
 
 
 
 
 
 
 
 
33
  ### French Benchmark Scores
34
 
35
+ | Model | IFEval<br>French | GPQA-Diamond<br>French | MMLU<br>French | Math500<br>French | Arc-Challenge<br>French | Hellaswag<br>French |
36
+ |------------------------|-----------------|-----------------------|----------------|-----------------|------------------------|-------------------|
37
+ | **Luth-1.7B-Instruct** | <u>58.53</u> | <u>36.55</u> | <u>49.75</u> | <u>62.60</u> | 35.16 | 31.88 |
38
+ | Qwen3-1.7B | 54.71 | 31.98 | 28.49 | 60.40 | 33.28 | 24.86 |
39
+ | SmolLM2-1.7B-Instruct | 30.93 | 20.30 | 33.73 | 10.20 | 28.57 | <u>49.58</u> |
40
+ | Qwen2.5-1.5B-Instruct | 31.30 | 27.41 | 46.25 | 33.20 | 32.68 | 34.33 |
41
+ | LFM2-1.2B | 54.41 | 22.84 | 47.59 | 36.80 | <u>39.44</u> | 33.05 |
 
42
 
43
  ### English Benchmark Scores
44
 
45
+ | Model | IFEval<br>English | GPQA-Diamond<br>English | MMLU<br>English | Math500<br>English | Arc-Challenge<br>English | Hellaswag<br>English |
46
+ |------------------------|-----------------|------------------------|----------------|------------------|-------------------------|--------------------|
47
+ | **Luth-1.7B-Instruct** | 65.80 | 29.80 | <u>60.28</u> | 70.40 | 42.24 | 58.53 |
48
+ | Qwen3-1.7B | <u>68.88</u> | <u>31.82</u> | 52.82 | <u>71.20</u> | 36.18 | 46.98 |
49
+ | SmolLM2-1.7B-Instruct | 49.04 | 25.08 | 50.27 | 22.67 | 42.32 | <u>66.94</u> |
50
+ | Qwen2.5-1.5B-Instruct | 39.99 | 25.76 | 59.81 | 57.20 | 41.04 | 64.48 |
51
+ | LFM2-1.2B | 68.52 | 24.24 | 55.22 | 45.80 | <u>42.58</u> | 57.61 |
 
52
 
53
  ## Code Example
54