Hybrid Tiny LM POC on WikiText-2
Proof-of-concept causal language model combining multiple token-mixing approaches:
- local/full causal self-attention for exact token lookup
- gated causal depthwise convolution for local pattern mixing
- input-gated diagonal recurrent mixer as a tiny SSM/RWKV-like compressed memory
- hybrid blocks that run attention + conv + recurrence in parallel and fuse them
Dataset: Salesforce/wikitext, config wikitext-2-raw-v1.
Tokenizer: openai-community/gpt2.
POC metrics
{
"eval_loss": 7.388123512268066,
"eval_runtime": 9.5676,
"eval_samples_per_second": 26.757,
"eval_steps_per_second": 3.345,
"epoch": 0.78125,
"perplexity": 1616.6696041601817,
"train_loss": 7.8487934923172,
"params": 10403040
}
This is intentionally tiny and trained briefly as an architecture POC, not a competitive LM.
Reproduce
pip install "transformers>=4.54.0" datasets torch accelerate trackio
python hybrid_lm_poc.py \
--max_steps 200 \
--max_train_samples 2048 \
--max_eval_samples 256 \
--batch_size 8 \
--d_model 96 \
--n_layer 6 \
--block_size 64 \
--learning_rate 8e-4
Sample generation after the short run is in sample_generation.txt.
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'rahulshetty/hybrid-tiny-wikitext-poc'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.
- Downloads last month
- -