|
|
--- |
|
|
tags: |
|
|
- sketchtune |
|
|
- sketch to adapt |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Fine-Tuned Model Checkpoints for *(ICML 2025) Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation* |
|
|
|
|
|
This repository contains the fine-tuned model checkpoints used in our ICML 2025 paper: **Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation**. |
|
|
|
|
|
The table below lists the available models along with their fine-tuning datasets, bit widths, groups per row, and training epochs. |
|
|
|
|
|
| Model | Dataset | Bits | Groups Per Row (GPR) | Epochs | |
|
|
| ---------- | ----------- | --------- | -------------------- | ------ | |
|
|
| Llama-3-8B | Commonsense | INT4 | 4 | 1,2 | |
|
|
| Llama-3-8B | Math | INT4 | 1,2,4,8 | 1,2,3,4 | |
|
|
| Llama-2-7B | Commonsense | INT4 | 4 | 1,2 | |
|
|
| Llama-2-7B | Math | INT4 | 1,2,4,8 | 1,2,3,4 | |
|
|
| Llama-7B | Commonsense | INT4 | 4 | 1,2 | |
|
|
| Llama-7B | Math | INT4 | 1,2,4,8 | 1,2,3,4 | |
|
|
| Llama-13B | Commonsense | INT4 | 4 | 1,2 | |
|
|
| Llama-13B | Math | INT4 | 1,2,4,8 | 1,2,3,4 | |
|
|
|
|
|
For full details on how to reproduce the experiments, please refer to our GitHub repository: |
|
|
|
|
|
π [https://github.com/LeanModels/SketchTune](https://github.com/LeanModels/SketchTune) π |
|
|
|
|
|
### What is SketchTune? |
|
|
|
|
|
SketchTune is a novel method for adapting large language models (LLMs) that focuses on reducing memory usage and improving speed while fine-tuning. Instead of adding low-rank adapters like LoRA or DoRA, it compresses the model's weights into compact, trainable "sketches" for downstream adaptation. |
|
|
|
|
|
**Key benefits:** |
|
|
|
|
|
* **Combines compression and adaptation** - SketchTune trains directly on compressed representations, removing the need for separate adapters. This saves memory, improves model performance and speed. |
|
|
* **Avoids low-rank limits** - Low-rank adapters assume weight updates follow a low rank structure. SketchTune skips this assumption, using sketching to better capture complex changes in model weights. |
|
|
|
|
|
**Performance highlights:** |
|
|
|
|
|
* Even with base models that are **2.6β3.5Γ smaller**, SketchTune **outperforms LoRA, DoRA, and S2FT** on commonsense and math reasoning benchmarks. |
|
|
* On the GSM8K math dataset, SketchTune achieves a **14.48% higher accuracy than LoftQ**, while training **7.3Γ fewer parameters**. |
|
|
|
|
|
For a deep dive into how sketching works, including math details and extensive test results, check out our full paper: [https://arxiv.org/abs/2410.06364](https://arxiv.org/abs/2410.06364). |
|
|
|
|
|
### Citation |
|
|
|
|
|
If you find this work helpful, please consider citing our paper: |
|
|
```bibtex |
|
|
@inproceedings{ |
|
|
zhang2025sketch, |
|
|
title={Sketch to Adapt: Fine-Tunable Sketches for Efficient {LLM} Adaptation}, |
|
|
author={Tianyi Zhang and Junda Su and Aditya Desai and Oscar Wu and Zhaozhuo Xu and Anshumali Shrivastava}, |
|
|
booktitle={Forty-second International Conference on Machine Learning}, |
|
|
year={2025}, |
|
|
url={https://openreview.net/forum?id=zZXOXhxO6I} |
|
|
} |
|
|
``` |