antoinechss's picture
Update README.md
51192ad verified
---
license: apache-2.0
datasets:
- mindchain/wikitext2
- allenai/c4
language:
- en
metrics:
- perplexity
base_model:
- TinyLlama/TinyLlama-1.1B-Chat-v1.0
pipeline_tag: text-generation
library_name: transformers
tags:
- performer
- linear-attention
- favor+
- knowledge-distillation
- research
---
- **Model description**: TinyLlama 1.1B with K/32 softmax attention heads replaced by FAVOR+ linear attention, fine-tuned via knowledge distillation
- **Intended use**: Research — evaluating quality/speed/approximation trade-offs of linear attention substitution
- **How to load**: code snippet showing how to reconstruct MixedPerformerAttention and load the checkpoint
- **Training details**: WikiText-103, 20k samples, SEQ_LEN=256, distillation loss, 4-phase curriculum
- **Results table**: same as the README (ppl per phase)
- **Limitations**: Phase 4 (32/32 heads) collapsed — not suitable for inference. Phase 2 is the recommended checkpoint.