Upload folder using huggingface_hub
Browse files- README.md +57 -0
- commonsense/llama-13b_gpr4/epoch_0/sketched_params.pkl +3 -0
- commonsense/llama-13b_gpr4/epoch_1/sketched_params.pkl +3 -0
- commonsense/llama-2-7b_gpr4/epoch_0/sketched_params.pkl +3 -0
- commonsense/llama-2-7b_gpr4/epoch_1/sketched_params.pkl +3 -0
- commonsense/llama-3-8b_gpr4/epoch_0/sketched_params.pkl +3 -0
- commonsense/llama-3-8b_gpr4/epoch_1/sketched_params.pkl +3 -0
- commonsense/llama-7b_gpr4/epoch_0/sketched_params.pkl +3 -0
- commonsense/llama-7b_gpr4/epoch_1/sketched_params.pkl +3 -0
- config.json +3 -0
README.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- sketchtune
|
| 4 |
+
- sketch to adapt
|
| 5 |
+
library_name: transformers
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# Fine-Tuned Model Checkpoints for *(ICML 2025) Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation*
|
| 9 |
+
|
| 10 |
+
This repository contains the fine-tuned model checkpoints used in our ICML 2025 paper: **Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation**.
|
| 11 |
+
|
| 12 |
+
The table below lists the available models along with their fine-tuning datasets, bit widths, groups per row, and training epochs.
|
| 13 |
+
|
| 14 |
+
| Model | Dataset | Bits | Groups Per Row (GPR) | Epochs |
|
| 15 |
+
| ---------- | ----------- | --------- | -------------------- | ------ |
|
| 16 |
+
| Llama-3-8B | Commonsense | INT4 | 4 | 1,2 |
|
| 17 |
+
| Llama-3-8B | Math | INT4 | 1,2,4,8 | 1,2,3,4 |
|
| 18 |
+
| Llama-2-7B | Commonsense | INT4 | 4 | 1,2 |
|
| 19 |
+
| Llama-2-7B | Math | INT4 | 1,2,4,8 | 1,2,3,4 |
|
| 20 |
+
| Llama-7B | Commonsense | INT4 | 4 | 1,2 |
|
| 21 |
+
| Llama-7B | Math | INT4 | 1,2,4,8 | 1,2,3,4 |
|
| 22 |
+
| Llama-13B | Commonsense | INT4 | 4 | 1,2 |
|
| 23 |
+
| Llama-13B | Math | INT4 | 1,2,4,8 | 1,2,3,4 |
|
| 24 |
+
|
| 25 |
+
For full details on how to reproduce the experiments, please refer to our GitHub repository:
|
| 26 |
+
|
| 27 |
+
👉 [https://github.com/LeanModels/SketchTune](https://github.com/LeanModels/SketchTune) 👈
|
| 28 |
+
|
| 29 |
+
### What is SketchTune?
|
| 30 |
+
|
| 31 |
+
SketchTune is a novel method for adapting large language models (LLMs) that focuses on reducing memory usage and improving speed while fine-tuning. Instead of adding low-rank adapters like LoRA or DoRA, it compresses the model's weights into compact, trainable "sketches" for downstream adaptation.
|
| 32 |
+
|
| 33 |
+
**Key benefits:**
|
| 34 |
+
|
| 35 |
+
* **Combines compression and adaptation** - SketchTune trains directly on compressed representations, removing the need for separate adapters. This saves memory, improves model performance and speed.
|
| 36 |
+
* **Avoids low-rank limits** - Low-rank adapters assume weight updates follow a low rank structure. SketchTune skips this assumption, using sketching to better capture complex changes in model weights.
|
| 37 |
+
|
| 38 |
+
**Performance highlights:**
|
| 39 |
+
|
| 40 |
+
* Even with base models that are **2.6–3.5× smaller**, SketchTune **outperforms LoRA, DoRA, and S2FT** on commonsense and math reasoning benchmarks.
|
| 41 |
+
* On the GSM8K math dataset, SketchTune achieves a **14.48% higher accuracy than LoftQ**, while training **7.3× fewer parameters**.
|
| 42 |
+
|
| 43 |
+
For a deep dive into how sketching works, including math details and extensive test results, check out our full paper: [https://arxiv.org/abs/2410.06364](https://arxiv.org/abs/2410.06364).
|
| 44 |
+
|
| 45 |
+
### Citation
|
| 46 |
+
|
| 47 |
+
If you find this work helpful, please consider citing our paper:
|
| 48 |
+
```bibtex
|
| 49 |
+
@inproceedings{
|
| 50 |
+
zhang2025sketch,
|
| 51 |
+
title={Sketch to Adapt: Fine-Tunable Sketches for Efficient {LLM} Adaptation},
|
| 52 |
+
author={Tianyi Zhang and Junda Su and Aditya Desai and Oscar Wu and Zhaozhuo Xu and Anshumali Shrivastava},
|
| 53 |
+
booktitle={Forty-second International Conference on Machine Learning},
|
| 54 |
+
year={2025},
|
| 55 |
+
url={https://openreview.net/forum?id=zZXOXhxO6I}
|
| 56 |
+
}
|
| 57 |
+
```
|
commonsense/llama-13b_gpr4/epoch_0/sketched_params.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bd2914b27aa416d37a9566176599676d60ed801929d533d6b2cc9653cd1bd327
|
| 3 |
+
size 272723337
|
commonsense/llama-13b_gpr4/epoch_1/sketched_params.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:49922fd7928ceeca7d412fceb058b32feec6795249349fdd982ea08cd2e63ab7
|
| 3 |
+
size 272723351
|
commonsense/llama-2-7b_gpr4/epoch_0/sketched_params.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:640451787ea6fc22e0104da712cda1e0e23fd901bf5f5573827e01dcdf0059dc
|
| 3 |
+
size 174138797
|
commonsense/llama-2-7b_gpr4/epoch_1/sketched_params.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b1f5b34baf35bdfc67cf615f0ab9ca72c16fc13a3730905e2f8d5998cc142f7f
|
| 3 |
+
size 174138797
|
commonsense/llama-3-8b_gpr4/epoch_0/sketched_params.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:25f63c1c7c0cb97b74400d0ef60783e412f258fca2d96f7046725cfccd3df18f
|
| 3 |
+
size 176235895
|
commonsense/llama-3-8b_gpr4/epoch_1/sketched_params.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9a65d96b0c5181d1534e335c22e63a6c187436e16103b87ff85e995737a772bc
|
| 3 |
+
size 176235909
|
commonsense/llama-7b_gpr4/epoch_0/sketched_params.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e519c145a4e0c436581368e0a44c02a1796bd5954bf38b60810771f2ec449df8
|
| 3 |
+
size 174138695
|
commonsense/llama-7b_gpr4/epoch_1/sketched_params.pkl
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c5d93ba18d23d4a215449bb053aad69ceb9a3a4daddeeb0ad9fa34a1a399f483
|
| 3 |
+
size 174138661
|
config.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "llama"
|
| 3 |
+
}
|