xLSTM-7b-Polymath / README.md

Update README.md

314b267 verified about 2 months ago

17.5 kB

	---
	base_model: ethicalabs/xLSTM-7b-Instruct
	library_name: transformers
	model_name: xlstm-7b-instruct-phase-2
	tags:
	- sft
	- transformers
	- trl
	licence: license
	pipeline_tag: text-generation
	---

	# Model Card for xlstm-7b-instruct-phase-2

	This model is a fine-tuned version of [ethicalabs/xLSTM-7b-Instruct](https://huggingface.co/ethicalabs/xLSTM-7b-Instruct) for task alignment.

	It has been trained using [TRL](https://github.com/huggingface/trl) using SFT on assistant-only tokens.

	The `k_proj` and `v_proj` matrices have been frozen to isolate and preserve the model's pre-trained knowledge base.

	This fine-tuning focused only on the `q_proj` (query) and FFN matrices, adapting the model's reasoning and query-retrieval mechanisms without overwriting its core, frozen knowledge.

	This experiment was designed to test the hypothesis that the model's reasoning capabilities (`q_proj`) could be specialized for math/code while its knowledge (`k_proj`, `v_proj`) remained intact.

	## Quick start

	Work in Progress!

	## Training procedure

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ethicalabs-ai/xlstm-finetuning-ultrafeedback/runs/zxpd9xeh)


	This model was trained with SFT.

	## Evaluation

	This model has been loaded in 4-bit and evaluated with [lighteval](https://github.com/huggingface/lighteval)

	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|------------------------------------------------------\|-------\|----------------------------------------------------------------------------------------------------------------------------\|-----:\|---\|-----:\|
	\|all \| \|acc \|0.5383\|± \|0.1476\|
	\| \| \|acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) \|0.7000\|± \|0.1528\|
	\| \| \|acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) \|0.8000\|± \|0.1333\|
	\| \| \|truthfulqa_mc1 \|0.6000\|± \|0.1633\|
	\| \| \|truthfulqa_mc2 \|0.7066\|± \|0.1481\|
	\| \| \|em:normalize_gold=<function gsm8k_normalizer at 0x7c5d972c3ba0>&normalize_pred=<function gsm8k_normalizer at 0x7c5d972c3ba0>\|0.6000\|± \|0.1633\|
	\|leaderboard:arc:challenge:25 \| \|acc \|0.8000\|± \|0.1333\|
	\| \| \|acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) \|0.7000\|± \|0.1528\|
	\|leaderboard:gsm8k:5 \| \|em:normalize_gold=<function gsm8k_normalizer at 0x7c5d972c3ba0>&normalize_pred=<function gsm8k_normalizer at 0x7c5d972c3ba0>\|0.6000\|± \|0.1633\|
	\|leaderboard:hellaswag:10 \| \|acc \|0.5000\|± \|0.1667\|
	\| \| \|acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:_average:5 \| \|acc \|0.5316\|± \|0.1474\|
	\|leaderboard:mmlu:abstract_algebra:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:anatomy:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:astronomy:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:business_ethics:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:clinical_knowledge:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:college_biology:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:college_chemistry:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:college_computer_science:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:college_mathematics:5 \| \|acc \|0.2000\|± \|0.1333\|
	\|leaderboard:mmlu:college_medicine:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:college_physics:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:computer_security:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:conceptual_physics:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:econometrics:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:electrical_engineering:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:elementary_mathematics:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:formal_logic:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:global_facts:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_biology:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:high_school_chemistry:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:high_school_computer_science:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:high_school_european_history:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_geography:5 \| \|acc \|1.0000\|± \|0.0000\|
	\|leaderboard:mmlu:high_school_government_and_politics:5\| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:high_school_macroeconomics:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:high_school_mathematics:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_microeconomics:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_physics:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_psychology:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:high_school_statistics:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:high_school_us_history:5 \| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:high_school_world_history:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:human_aging:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:human_sexuality:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:international_law:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:jurisprudence:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:logical_fallacies:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:machine_learning:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:management:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:marketing:5 \| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:medical_genetics:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:miscellaneous:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:moral_disputes:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:moral_scenarios:5 \| \|acc \|0.1000\|± \|0.1000\|
	\|leaderboard:mmlu:nutrition:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:philosophy:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:prehistory:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:professional_accounting:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:professional_law:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:professional_medicine:5 \| \|acc \|0.2000\|± \|0.1333\|
	\|leaderboard:mmlu:professional_psychology:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:public_relations:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:security_studies:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:sociology:5 \| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:us_foreign_policy:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:virology:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:world_religions:5 \| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:truthfulqa:mc:0 \| \|truthfulqa_mc1 \|0.6000\|± \|0.1633\|
	\| \| \|truthfulqa_mc2 \|0.7066\|± \|0.1481\|
	\|leaderboard:winogrande:5 \| \|acc \|0.7000\|± \|0.1528\|



	### Framework versions

	- PEFT 0.17.1
	- TRL: 0.24.0
	- Transformers: 4.57.1
	- Pytorch: 2.8.0+cu126
	- Datasets: 4.2.0
	- Tokenizers: 0.22.1

	## Citations

	Cite TRL as:

	```bibtex
	@misc{vonwerra2022trl,
	title = {{TRL: Transformer Reinforcement Learning}},
	author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year = 2020,
	journal = {GitHub repository},
	publisher = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
	}
	```

	---
	base_model: ethicalabs/xLSTM-7b-Instruct
	library_name: transformers
	model_name: xlstm-7b-instruct-phase-2
	tags:
	- sft
	- transformers
	- trl
	licence: license
	pipeline_tag: text-generation
	---

	# Model Card for xlstm-7b-instruct-phase-2

	This model is a fine-tuned version of [ethicalabs/xLSTM-7b-Instruct](https://huggingface.co/ethicalabs/xLSTM-7b-Instruct) for task alignment.

	It has been trained using [TRL](https://github.com/huggingface/trl) using SFT on assistant-only tokens.

	The `k_proj` and `v_proj` matrices have been frozen to isolate and preserve the model's pre-trained knowledge base.

	This fine-tuning focused only on the `q_proj` (query) and FFN matrices, adapting the model's reasoning and query-retrieval mechanisms without overwriting its core, frozen knowledge.

	This experiment was designed to test the hypothesis that the model's reasoning capabilities (`q_proj`) could be specialized for math/code while its knowledge (`k_proj`, `v_proj`) remained intact.

	## Quick start

	Work in Progress!

	## Training procedure

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ethicalabs-ai/xlstm-finetuning-ultrafeedback/runs/zxpd9xeh)


	This model was trained with SFT.

	## Evaluation

	This model has been loaded in 4-bit and evaluated with [lighteval](https://github.com/huggingface/lighteval)

	\| Task \|Version\| Metric \|Value \| \|Stderr\|
	\|------------------------------------------------------\|-------\|----------------------------------------------------------------------------------------------------------------------------\|-----:\|---\|-----:\|
	\|all \| \|acc \|0.5383\|± \|0.1476\|
	\| \| \|acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) \|0.7000\|± \|0.1528\|
	\| \| \|acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) \|0.8000\|± \|0.1333\|
	\| \| \|truthfulqa_mc1 \|0.6000\|± \|0.1633\|
	\| \| \|truthfulqa_mc2 \|0.7066\|± \|0.1481\|
	\| \| \|em:normalize_gold=<function gsm8k_normalizer at 0x7c5d972c3ba0>&normalize_pred=<function gsm8k_normalizer at 0x7c5d972c3ba0>\|0.6000\|± \|0.1633\|
	\|leaderboard:arc:challenge:25 \| \|acc \|0.8000\|± \|0.1333\|
	\| \| \|acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=True) \|0.7000\|± \|0.1528\|
	\|leaderboard:gsm8k:5 \| \|em:normalize_gold=<function gsm8k_normalizer at 0x7c5d972c3ba0>&normalize_pred=<function gsm8k_normalizer at 0x7c5d972c3ba0>\|0.6000\|± \|0.1633\|
	\|leaderboard:hellaswag:10 \| \|acc \|0.5000\|± \|0.1667\|
	\| \| \|acc:logprob_normalization=LogProbCharNorm(name='norm', ignore_first_space=False) \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:_average:5 \| \|acc \|0.5316\|± \|0.1474\|
	\|leaderboard:mmlu:abstract_algebra:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:anatomy:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:astronomy:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:business_ethics:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:clinical_knowledge:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:college_biology:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:college_chemistry:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:college_computer_science:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:college_mathematics:5 \| \|acc \|0.2000\|± \|0.1333\|
	\|leaderboard:mmlu:college_medicine:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:college_physics:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:computer_security:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:conceptual_physics:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:econometrics:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:electrical_engineering:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:elementary_mathematics:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:formal_logic:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:global_facts:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_biology:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:high_school_chemistry:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:high_school_computer_science:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:high_school_european_history:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_geography:5 \| \|acc \|1.0000\|± \|0.0000\|
	\|leaderboard:mmlu:high_school_government_and_politics:5\| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:high_school_macroeconomics:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:high_school_mathematics:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_microeconomics:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_physics:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:high_school_psychology:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:high_school_statistics:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:high_school_us_history:5 \| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:high_school_world_history:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:human_aging:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:human_sexuality:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:international_law:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:jurisprudence:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:logical_fallacies:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:machine_learning:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:management:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:marketing:5 \| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:medical_genetics:5 \| \|acc \|0.9000\|± \|0.1000\|
	\|leaderboard:mmlu:miscellaneous:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:moral_disputes:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:moral_scenarios:5 \| \|acc \|0.1000\|± \|0.1000\|
	\|leaderboard:mmlu:nutrition:5 \| \|acc \|0.6000\|± \|0.1633\|
	\|leaderboard:mmlu:philosophy:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:prehistory:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:professional_accounting:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:professional_law:5 \| \|acc \|0.4000\|± \|0.1633\|
	\|leaderboard:mmlu:professional_medicine:5 \| \|acc \|0.2000\|± \|0.1333\|
	\|leaderboard:mmlu:professional_psychology:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:public_relations:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:security_studies:5 \| \|acc \|0.3000\|± \|0.1528\|
	\|leaderboard:mmlu:sociology:5 \| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:mmlu:us_foreign_policy:5 \| \|acc \|0.7000\|± \|0.1528\|
	\|leaderboard:mmlu:virology:5 \| \|acc \|0.5000\|± \|0.1667\|
	\|leaderboard:mmlu:world_religions:5 \| \|acc \|0.8000\|± \|0.1333\|
	\|leaderboard:truthfulqa:mc:0 \| \|truthfulqa_mc1 \|0.6000\|± \|0.1633\|
	\| \| \|truthfulqa_mc2 \|0.7066\|± \|0.1481\|
	\|leaderboard:winogrande:5 \| \|acc \|0.7000\|± \|0.1528\|



	### Framework versions

	- PEFT 0.17.1
	- TRL: 0.24.0
	- Transformers: 4.57.1
	- Pytorch: 2.8.0+cu126
	- Datasets: 4.2.0
	- Tokenizers: 0.22.1

	## Citations

	Cite TRL as:

	```bibtex
	@misc{vonwerra2022trl,
	title = {{TRL: Transformer Reinforcement Learning}},
	author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year = 2020,
	journal = {GitHub repository},
	publisher = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
	}
	```