Update README.md
Browse files
README.md
CHANGED
|
@@ -30,37 +30,25 @@ Luth was trained using full fine-tuning on the Luth-SFT dataset with [Axolotl](h
|
|
| 30 |
|
| 31 |
We used LightEval for evaluation, with custom tasks for the French benchmarks. The models were evaluated with a `temperature=0`.
|
| 32 |
|
| 33 |
-
### Evaluation Visualizations
|
| 34 |
-
|
| 35 |
-
**French Evaluation:**
|
| 36 |
-
|
| 37 |
-

|
| 38 |
-
|
| 39 |
-
**English Evaluation:**
|
| 40 |
-
|
| 41 |
-

|
| 42 |
-
|
| 43 |
### French Benchmark Scores
|
| 44 |
|
| 45 |
-
|
|
| 46 |
-
|
| 47 |
-
|
|
| 48 |
-
|
|
| 49 |
-
|
|
| 50 |
-
|
|
| 51 |
-
|
|
| 52 |
-
| hellaswag-fr | 24.86 | <u>49.58</u> | 34.34 | 31.93 |
|
| 53 |
|
| 54 |
### English Benchmark Scores
|
| 55 |
|
| 56 |
-
|
|
| 57 |
-
|
| 58 |
-
|
|
| 59 |
-
|
|
| 60 |
-
|
|
| 61 |
-
|
|
| 62 |
-
|
|
| 63 |
-
| hellaswag-en | 46.96 | <u>66.94</u> | 64.48 | 58.55 |
|
| 64 |
|
| 65 |
## Code Example
|
| 66 |
|
|
|
|
| 30 |
|
| 31 |
We used LightEval for evaluation, with custom tasks for the French benchmarks. The models were evaluated with a `temperature=0`.
|
| 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
### French Benchmark Scores
|
| 34 |
|
| 35 |
+
| Model | IFEval<br>French | GPQA-Diamond<br>French | MMLU<br>French | Math500<br>French | Arc-Challenge<br>French | Hellaswag<br>French |
|
| 36 |
+
|------------------------|-----------------|-----------------------|----------------|-----------------|------------------------|-------------------|
|
| 37 |
+
| **Luth-1.7B-Instruct** | <u>58.53</u> | <u>36.55</u> | <u>49.75</u> | <u>62.60</u> | 35.16 | 31.88 |
|
| 38 |
+
| Qwen3-1.7B | 54.71 | 31.98 | 28.49 | 60.40 | 33.28 | 24.86 |
|
| 39 |
+
| SmolLM2-1.7B-Instruct | 30.93 | 20.30 | 33.73 | 10.20 | 28.57 | <u>49.58</u> |
|
| 40 |
+
| Qwen2.5-1.5B-Instruct | 31.30 | 27.41 | 46.25 | 33.20 | 32.68 | 34.33 |
|
| 41 |
+
| LFM2-1.2B | 54.41 | 22.84 | 47.59 | 36.80 | <u>39.44</u> | 33.05 |
|
|
|
|
| 42 |
|
| 43 |
### English Benchmark Scores
|
| 44 |
|
| 45 |
+
| Model | IFEval<br>English | GPQA-Diamond<br>English | MMLU<br>English | Math500<br>English | Arc-Challenge<br>English | Hellaswag<br>English |
|
| 46 |
+
|------------------------|-----------------|------------------------|----------------|------------------|-------------------------|--------------------|
|
| 47 |
+
| **Luth-1.7B-Instruct** | 65.80 | 29.80 | <u>60.28</u> | 70.40 | 42.24 | 58.53 |
|
| 48 |
+
| Qwen3-1.7B | <u>68.88</u> | <u>31.82</u> | 52.82 | <u>71.20</u> | 36.18 | 46.98 |
|
| 49 |
+
| SmolLM2-1.7B-Instruct | 49.04 | 25.08 | 50.27 | 22.67 | 42.32 | <u>66.94</u> |
|
| 50 |
+
| Qwen2.5-1.5B-Instruct | 39.99 | 25.76 | 59.81 | 57.20 | 41.04 | 64.48 |
|
| 51 |
+
| LFM2-1.2B | 68.52 | 24.24 | 55.22 | 45.80 | <u>42.58</u> | 57.61 |
|
|
|
|
| 52 |
|
| 53 |
## Code Example
|
| 54 |
|