File size: 17,509 Bytes
0fd0085 d23e63b 0fd0085 d23e63b 0fd0085 d23e63b 0fd0085 0d2d82a 0fd0085 0d2d82a 0fd0085 0d2d82a 0fd0085 0d2d82a 0fd0085 d23e63b 0fd0085 d23e63b 0fd0085 d23e63b 0fd0085 314b267 d23e63b 0fd0085 d23e63b 0fd0085 d23e63b 0fd0085 d23e63b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
base_model: ethicalabs/xLSTM-7b-Instruct
library_name: transformers
model_name: xlstm-7b-instruct-phase-2
tags:
- sft
- transformers
- trl
licence: license
pipeline_tag: text-generation
---
# Model Card for xlstm-7b-instruct-phase-2
This model is a fine-tuned version of [ethicalabs/xLSTM-7b-Instruct](https://huggingface.co/ethicalabs/xLSTM-7b-Instruct) for task alignment.
It has been trained using [TRL](https://github.com/huggingface/trl) using SFT on assistant-only tokens.
The `k_proj` and `v_proj` matrices have been frozen to isolate and preserve the model's pre-trained knowledge base.
This fine-tuning focused only on the `q_proj` (query) and FFN matrices, adapting the model's reasoning and query-retrieval mechanisms without overwriting its core, frozen knowledge.
This experiment was designed to test the hypothesis that the model's reasoning capabilities (`q_proj`) could be specialized for math/code while its knowledge (`k_proj`, `v_proj`) remained intact.
## Quick start
Work in Progress!
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ethicalabs-ai/xlstm-finetuning-ultrafeedback/runs/zxpd9xeh)
This model was trained with SFT.
## Evaluation
This model has been loaded in 4-bit and evaluated with [lighteval](https://github.com/huggingface/lighteval)
| Task |Version| Metric |Value | |Stderr|
|------------------------------------------------------|-------|----------------------------------------------------------------------------------------------------------------------------|-----:|---|-----:|
|all | |acc |0.5383|± |0.1476|
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) |0.7000|± |0.1528|
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) |0.8000|± |0.1333|
| | |truthfulqa_mc1 |0.6000|± |0.1633|
| | |truthfulqa_mc2 |0.7066|± |0.1481|
| | |em:normalize_gold=<function gsm8k_normalizer at 0x7c5d972c3ba0>&normalize_pred=<function gsm8k_normalizer at 0x7c5d972c3ba0>|0.6000|± |0.1633|
|leaderboard:arc:challenge:25 | |acc |0.8000|± |0.1333|
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) |0.7000|± |0.1528|
|leaderboard:gsm8k:5 | |em:normalize_gold=<function gsm8k_normalizer at 0x7c5d972c3ba0>&normalize_pred=<function gsm8k_normalizer at 0x7c5d972c3ba0>|0.6000|± |0.1633|
|leaderboard:hellaswag:10 | |acc |0.5000|± |0.1667|
| | |acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) |0.8000|± |0.1333|
|leaderboard:mmlu:_average:5 | |acc |0.5316|± |0.1474|
|leaderboard:mmlu:abstract_algebra:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:anatomy:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:astronomy:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:business_ethics:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:clinical_knowledge:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:college_biology:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:college_chemistry:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:college_computer_science:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:college_mathematics:5 | |acc |0.2000|± |0.1333|
|leaderboard:mmlu:college_medicine:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:college_physics:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:computer_security:5 | |acc |0.9000|± |0.1000|
|leaderboard:mmlu:conceptual_physics:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:econometrics:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:electrical_engineering:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:elementary_mathematics:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:formal_logic:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:global_facts:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:high_school_biology:5 | |acc |0.9000|± |0.1000|
|leaderboard:mmlu:high_school_chemistry:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:high_school_computer_science:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:high_school_european_history:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:high_school_geography:5 | |acc |1.0000|± |0.0000|
|leaderboard:mmlu:high_school_government_and_politics:5| |acc |0.8000|± |0.1333|
|leaderboard:mmlu:high_school_macroeconomics:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:high_school_mathematics:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:high_school_microeconomics:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:high_school_physics:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:high_school_psychology:5 | |acc |0.9000|± |0.1000|
|leaderboard:mmlu:high_school_statistics:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:high_school_us_history:5 | |acc |0.8000|± |0.1333|
|leaderboard:mmlu:high_school_world_history:5 | |acc |0.9000|± |0.1000|
|leaderboard:mmlu:human_aging:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:human_sexuality:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:international_law:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:jurisprudence:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:logical_fallacies:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:machine_learning:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:management:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:marketing:5 | |acc |0.8000|± |0.1333|
|leaderboard:mmlu:medical_genetics:5 | |acc |0.9000|± |0.1000|
|leaderboard:mmlu:miscellaneous:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:moral_disputes:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:moral_scenarios:5 | |acc |0.1000|± |0.1000|
|leaderboard:mmlu:nutrition:5 | |acc |0.6000|± |0.1633|
|leaderboard:mmlu:philosophy:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:prehistory:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:professional_accounting:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:professional_law:5 | |acc |0.4000|± |0.1633|
|leaderboard:mmlu:professional_medicine:5 | |acc |0.2000|± |0.1333|
|leaderboard:mmlu:professional_psychology:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:public_relations:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:security_studies:5 | |acc |0.3000|± |0.1528|
|leaderboard:mmlu:sociology:5 | |acc |0.8000|± |0.1333|
|leaderboard:mmlu:us_foreign_policy:5 | |acc |0.7000|± |0.1528|
|leaderboard:mmlu:virology:5 | |acc |0.5000|± |0.1667|
|leaderboard:mmlu:world_religions:5 | |acc |0.8000|± |0.1333|
|leaderboard:truthfulqa:mc:0 | |truthfulqa_mc1 |0.6000|± |0.1633|
| | |truthfulqa_mc2 |0.7066|± |0.1481|
|leaderboard:winogrande:5 | |acc |0.7000|± |0.1528|
### Framework versions
- PEFT 0.17.1
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.8.0+cu126
- Datasets: 4.2.0
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
``` |