Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
4.6
TFLOPS
13
8
196
Felix Fischer
FlipTip
Follow
shtefcs's profile picture
Irshad303's profile picture
Illia56's profile picture
3 followers
Β·
7 following
AI & ML interests
None yet
Recent Activity
upvoted
an
article
24 days ago
Qwen3.5: Nobody Agrees on Attention Anymore
liked
a model
4 months ago
maya-research/maya1
reacted
to
codelion
's
post
with π₯
6 months ago
I wanted to share a technique that's been working really well for recovering performance after INT4 quantization. Typically, quantizing the LLM to INT4 (unlike say INT8) for inference can incur some accuracy loss. Instead of accepting the quality loss, we used the FP16 model as a teacher to train a tiny LoRA adapter (rank=16) for the quantized model. The cool part: the model generates its own training data using the Magpie technique so no external datasets needed. This is critical because we want to remain as much as possible in the distribution of the model's natural responses. Last year Apple's foundational models paper (https://arxiv.org/pdf/2407.21075) had proposed a similar technique and found "By using accuracy-recovery LoRA adapters with only rank 16, Alpaca win rate can be improved by 7-18%, GMS8K accuracy is boosted by 5-10%." (page 47). We saw similar results on Qwen3-0.6B: Perplexity: 2.40 β 2.09 (only 5.7% degradation from FP16 baseline) Memory: Only 0.28GB vs 1.0GB for FP16 (75% reduction) Speed: 3.0x faster inference than FP16 Quality: Generates correct, optimized code solutions - Pre-trained adapter: https://huggingface.co/codelion/Qwen3-0.6B-accuracy-recovery-lora - GitHub repo: https://github.com/codelion/ellora Happy to answer questions about the implementation or help anyone trying to replicate this. The key insight is that quantization errors are systematic and learnable - a small adapter can bridge the gap without negating the benefits of quantization. Has anyone else experimented with self-distillation for quantization recovery? Would love to hear about different approaches!
View all activity
Organizations
None yet
FlipTip
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
ibm-granite/granite-4.0-tiny-preview
10 months ago
Suggestion: publishing (parts of the) training data
2
#5 opened 10 months ago by
FlipTip
New activity in
huggingchat/chat-ui
10 months ago
Problem with web search
π
2
3
#716 opened 11 months ago by
DaniFera
New activity in
huggingchat/chat-ui
almost 2 years ago
[TOOLS] Community Discussion
π₯
π
3
27
#455 opened almost 2 years ago by
victor
Llama 3 120B?
π
2
1
#439 opened almost 2 years ago by
Tommy84
Add MPT-7B-Chat to the models to chose from
3
#180 opened almost 3 years ago by
FlipTip
System prompts that can be saved to a list (custom characters)
1
#316 opened over 2 years ago by
FlipTip
New activity in
huggingchat/chat-ui
over 2 years ago
System prompts that can be saved to a list (custom characters)
1
#316 opened over 2 years ago by
FlipTip
Add MPT-7B-Chat to the models to chose from
3
#180 opened almost 3 years ago by
FlipTip
Load more