File size: 17,509 Bytes
0fd0085
d23e63b
0fd0085
d23e63b
 
 
 
 
 
 
0fd0085
 
d23e63b
0fd0085
0d2d82a
0fd0085
0d2d82a
 
 
 
 
0fd0085
0d2d82a
 
 
0fd0085
0d2d82a
0fd0085
d23e63b
0fd0085
d23e63b
0fd0085
 
d23e63b
0fd0085
314b267
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d23e63b
0fd0085
d23e63b
 
 
 
 
 
0fd0085
d23e63b
0fd0085
d23e63b
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
base_model: ethicalabs/xLSTM-7b-Instruct
library_name: transformers
model_name: xlstm-7b-instruct-phase-2
tags:
- sft
- transformers
- trl
licence: license
pipeline_tag: text-generation
---

# Model Card for xlstm-7b-instruct-phase-2

This model is a fine-tuned version of [ethicalabs/xLSTM-7b-Instruct](https://huggingface.co/ethicalabs/xLSTM-7b-Instruct) for task alignment.

It has been trained using [TRL](https://github.com/huggingface/trl) using SFT on assistant-only tokens.

The `k_proj` and `v_proj` matrices have been frozen to isolate and preserve the model's pre-trained knowledge base.

This fine-tuning focused only on the `q_proj` (query) and FFN matrices, adapting the model's reasoning and query-retrieval mechanisms without overwriting its core, frozen knowledge.

This experiment was designed to test the hypothesis that the model's reasoning capabilities (`q_proj`) could be specialized for math/code while its knowledge (`k_proj`, `v_proj`) remained intact.

## Quick start

Work in Progress!

## Training procedure

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ethicalabs-ai/xlstm-finetuning-ultrafeedback/runs/zxpd9xeh) 


This model was trained with SFT.

## Evaluation

This model has been loaded in 4-bit and evaluated with [lighteval](https://github.com/huggingface/lighteval)

|                         Task                         |Version|                                                           Metric                                                           |Value |   |Stderr|
|------------------------------------------------------|-------|----------------------------------------------------------------------------------------------------------------------------|-----:|---|-----:|
|all                                                   |       |acc                                                                                                                         |0.5383|±  |0.1476|
|                                                      |       |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True)                                             |0.7000|±  |0.1528|
|                                                      |       |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False)                                            |0.8000|±  |0.1333|
|                                                      |       |truthfulqa_mc1                                                                                                              |0.6000|±  |0.1633|
|                                                      |       |truthfulqa_mc2                                                                                                              |0.7066|±  |0.1481|
|                                                      |       |em:normalize_gold=<function gsm8k_normalizer at 0x7c5d972c3ba0>&normalize_pred=<function gsm8k_normalizer at 0x7c5d972c3ba0>|0.6000|±  |0.1633|
|leaderboard:arc:challenge:25                          |       |acc                                                                                                                         |0.8000|±  |0.1333|
|                                                      |       |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True)                                             |0.7000|±  |0.1528|
|leaderboard:gsm8k:5                                   |       |em:normalize_gold=<function gsm8k_normalizer at 0x7c5d972c3ba0>&normalize_pred=<function gsm8k_normalizer at 0x7c5d972c3ba0>|0.6000|±  |0.1633|
|leaderboard:hellaswag:10                              |       |acc                                                                                                                         |0.5000|±  |0.1667|
|                                                      |       |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False)                                            |0.8000|±  |0.1333|
|leaderboard:mmlu:_average:5                           |       |acc                                                                                                                         |0.5316|±  |0.1474|
|leaderboard:mmlu:abstract_algebra:5                   |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:anatomy:5                            |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:astronomy:5                          |       |acc                                                                                                                         |0.7000|±  |0.1528|
|leaderboard:mmlu:business_ethics:5                    |       |acc                                                                                                                         |0.4000|±  |0.1633|
|leaderboard:mmlu:clinical_knowledge:5                 |       |acc                                                                                                                         |0.7000|±  |0.1528|
|leaderboard:mmlu:college_biology:5                    |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:college_chemistry:5                  |       |acc                                                                                                                         |0.4000|±  |0.1633|
|leaderboard:mmlu:college_computer_science:5           |       |acc                                                                                                                         |0.4000|±  |0.1633|
|leaderboard:mmlu:college_mathematics:5                |       |acc                                                                                                                         |0.2000|±  |0.1333|
|leaderboard:mmlu:college_medicine:5                   |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:college_physics:5                    |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:computer_security:5                  |       |acc                                                                                                                         |0.9000|±  |0.1000|
|leaderboard:mmlu:conceptual_physics:5                 |       |acc                                                                                                                         |0.4000|±  |0.1633|
|leaderboard:mmlu:econometrics:5                       |       |acc                                                                                                                         |0.4000|±  |0.1633|
|leaderboard:mmlu:electrical_engineering:5             |       |acc                                                                                                                         |0.7000|±  |0.1528|
|leaderboard:mmlu:elementary_mathematics:5             |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:formal_logic:5                       |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:global_facts:5                       |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:high_school_biology:5                |       |acc                                                                                                                         |0.9000|±  |0.1000|
|leaderboard:mmlu:high_school_chemistry:5              |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:high_school_computer_science:5       |       |acc                                                                                                                         |0.6000|±  |0.1633|
|leaderboard:mmlu:high_school_european_history:5       |       |acc                                                                                                                         |0.7000|±  |0.1528|
|leaderboard:mmlu:high_school_geography:5              |       |acc                                                                                                                         |1.0000|±  |0.0000|
|leaderboard:mmlu:high_school_government_and_politics:5|       |acc                                                                                                                         |0.8000|±  |0.1333|
|leaderboard:mmlu:high_school_macroeconomics:5         |       |acc                                                                                                                         |0.6000|±  |0.1633|
|leaderboard:mmlu:high_school_mathematics:5            |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:high_school_microeconomics:5         |       |acc                                                                                                                         |0.7000|±  |0.1528|
|leaderboard:mmlu:high_school_physics:5                |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:high_school_psychology:5             |       |acc                                                                                                                         |0.9000|±  |0.1000|
|leaderboard:mmlu:high_school_statistics:5             |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:high_school_us_history:5             |       |acc                                                                                                                         |0.8000|±  |0.1333|
|leaderboard:mmlu:high_school_world_history:5          |       |acc                                                                                                                         |0.9000|±  |0.1000|
|leaderboard:mmlu:human_aging:5                        |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:human_sexuality:5                    |       |acc                                                                                                                         |0.4000|±  |0.1633|
|leaderboard:mmlu:international_law:5                  |       |acc                                                                                                                         |0.6000|±  |0.1633|
|leaderboard:mmlu:jurisprudence:5                      |       |acc                                                                                                                         |0.6000|±  |0.1633|
|leaderboard:mmlu:logical_fallacies:5                  |       |acc                                                                                                                         |0.4000|±  |0.1633|
|leaderboard:mmlu:machine_learning:5                   |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:management:5                         |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:marketing:5                          |       |acc                                                                                                                         |0.8000|±  |0.1333|
|leaderboard:mmlu:medical_genetics:5                   |       |acc                                                                                                                         |0.9000|±  |0.1000|
|leaderboard:mmlu:miscellaneous:5                      |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:moral_disputes:5                     |       |acc                                                                                                                         |0.7000|±  |0.1528|
|leaderboard:mmlu:moral_scenarios:5                    |       |acc                                                                                                                         |0.1000|±  |0.1000|
|leaderboard:mmlu:nutrition:5                          |       |acc                                                                                                                         |0.6000|±  |0.1633|
|leaderboard:mmlu:philosophy:5                         |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:prehistory:5                         |       |acc                                                                                                                         |0.4000|±  |0.1633|
|leaderboard:mmlu:professional_accounting:5            |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:professional_law:5                   |       |acc                                                                                                                         |0.4000|±  |0.1633|
|leaderboard:mmlu:professional_medicine:5              |       |acc                                                                                                                         |0.2000|±  |0.1333|
|leaderboard:mmlu:professional_psychology:5            |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:public_relations:5                   |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:security_studies:5                   |       |acc                                                                                                                         |0.3000|±  |0.1528|
|leaderboard:mmlu:sociology:5                          |       |acc                                                                                                                         |0.8000|±  |0.1333|
|leaderboard:mmlu:us_foreign_policy:5                  |       |acc                                                                                                                         |0.7000|±  |0.1528|
|leaderboard:mmlu:virology:5                           |       |acc                                                                                                                         |0.5000|±  |0.1667|
|leaderboard:mmlu:world_religions:5                    |       |acc                                                                                                                         |0.8000|±  |0.1333|
|leaderboard:truthfulqa:mc:0                           |       |truthfulqa_mc1                                                                                                              |0.6000|±  |0.1633|
|                                                      |       |truthfulqa_mc2                                                                                                              |0.7066|±  |0.1481|
|leaderboard:winogrande:5                              |       |acc                                                                                                                         |0.7000|±  |0.1528|



### Framework versions

- PEFT 0.17.1
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.8.0+cu126
- Datasets: 4.2.0
- Tokenizers: 0.22.1

## Citations

Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```