LeanQuant
/

SketchTune-ckpts

sketch to adapt

Model card Files Files and versions

SketchTune-ckpts / README.md

LeanQuant's picture

Upload folder using huggingface_hub

8f94ed6 verified 8 months ago

|

history blame contribute delete

3.12 kB

	---
	tags:
	- sketchtune
	- sketch to adapt
	library_name: transformers
	---

	# Fine-Tuned Model Checkpoints for (ICML 2025) Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation

	This repository contains the fine-tuned model checkpoints used in our ICML 2025 paper: Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation.

	The table below lists the available models along with their fine-tuning datasets, bit widths, groups per row, and training epochs.

	\| Model \| Dataset \| Bits \| Groups Per Row (GPR) \| Epochs \|
	\| ---------- \| ----------- \| --------- \| -------------------- \| ------ \|
	\| Llama-3-8B \| Commonsense \| INT4 \| 4 \| 1,2 \|
	\| Llama-3-8B \| Math \| INT4 \| 1,2,4,8 \| 1,2,3,4 \|
	\| Llama-2-7B \| Commonsense \| INT4 \| 4 \| 1,2 \|
	\| Llama-2-7B \| Math \| INT4 \| 1,2,4,8 \| 1,2,3,4 \|
	\| Llama-7B \| Commonsense \| INT4 \| 4 \| 1,2 \|
	\| Llama-7B \| Math \| INT4 \| 1,2,4,8 \| 1,2,3,4 \|
	\| Llama-13B \| Commonsense \| INT4 \| 4 \| 1,2 \|
	\| Llama-13B \| Math \| INT4 \| 1,2,4,8 \| 1,2,3,4 \|

	For full details on how to reproduce the experiments, please refer to our GitHub repository:

	👉 [https://github.com/LeanModels/SketchTune](https://github.com/LeanModels/SketchTune) 👈

	### What is SketchTune?

	SketchTune is a novel method for adapting large language models (LLMs) that focuses on reducing memory usage and improving speed while fine-tuning. Instead of adding low-rank adapters like LoRA or DoRA, it compresses the model's weights into compact, trainable "sketches" for downstream adaptation.

	Key benefits:

	* Combines compression and adaptation - SketchTune trains directly on compressed representations, removing the need for separate adapters. This saves memory, improves model performance and speed.
	* Avoids low-rank limits - Low-rank adapters assume weight updates follow a low rank structure. SketchTune skips this assumption, using sketching to better capture complex changes in model weights.

	Performance highlights:

	* Even with base models that are 2.6–3.5× smaller, SketchTune outperforms LoRA, DoRA, and S2FT on commonsense and math reasoning benchmarks.
	* On the GSM8K math dataset, SketchTune achieves a 14.48% higher accuracy than LoftQ, while training 7.3× fewer parameters.

	For a deep dive into how sketching works, including math details and extensive test results, check out our full paper: [https://arxiv.org/abs/2410.06364](https://arxiv.org/abs/2410.06364).

	### Citation

	If you find this work helpful, please consider citing our paper:
	```bibtex
	@inproceedings{
	zhang2025sketch,
	title={Sketch to Adapt: Fine-Tunable Sketches for Efficient {LLM} Adaptation},
	author={Tianyi Zhang and Junda Su and Aditya Desai and Oscar Wu and Zhaozhuo Xu and Anshumali Shrivastava},
	booktitle={Forty-second International Conference on Machine Learning},
	year={2025},
	url={https://openreview.net/forum?id=zZXOXhxO6I}
	}
	```