File size: 3,312 Bytes
d915041 50d10e2 d915041 50d10e2 d915041 50d10e2 d915041 50d10e2 be51c2e 50d10e2 d915041 50d10e2 d915041 50d10e2 c7c556f 50d10e2 c7c556f 50d10e2 c7c556f 50d10e2 c7c556f 50d10e2 c7c556f 50d10e2 e71071f 50d10e2 e71071f 50d10e2 e71071f 50d10e2 e71071f 50d10e2 e71071f 50d10e2 e71071f 50d10e2 e71071f 50d10e2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | ---
language:
- en
- code
license: apache-2.0
tags:
- smol
- pretraining
- instruct
- 50M
- causal-lm
- gqa
- swiglu
- rmsnorm
datasets:
- HuggingFaceTB/smollm-corpus
metrics:
- perplexity
model-index:
- name: Quark-50m-Instruct
results: []
pipeline_tag: text-generation
---
# Quark-50m-Instruct
**Quark-50m-Instruct** is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following.
It is built on the same architecture of “SmolLM” family and was fully pretrained on 5 billion tokens from
[HuggingFaceTB/smollm‑corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).
- **Model type:** Causal Language Model (LLaMA‑style decoder)
- **Architecture:** GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying
- **Pretraining tokens:** 5 B
- **Fine‑tuning:** Instruction‑tuned (details below)
- **Creators:** [OvercastLab](https://huggingface.co/OvercastLab) (research & development lab for ML/AI)
- **Release date:** 22 April 2026
## Model Summary
Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM)
and even on CPU for light workloads. It is **not** competitive with large models on knowledge‑intensive tasks,
but it excels at:
- Simple conversational tasks
- Code generation and explanation (Python)
- Short text rewriting and summarisation
- On‑device / edge inference
The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM:
| Component | Details |
|-------------|-------------------------------|
| Vocab size | 49,152 |
| Hidden size | 384 |
| Layers | 24 |
| Attention | Grouped Query (6 Q heads, 2 KV heads) |
| FFN | SwiGLU with 1,024 intermediate |
| Position | RoPE (θ = 10,000) |
| Normalisation | RMSNorm (pre‑block) |
Total trainable parameters: **≈48 M** (with weight tying).
## Uses
### Direct Use
The model can be used via the 🤗 Transformers library for standard text generation.
It expects chat‑formatted input (see example below).
### Downstream Use
Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for
domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer.
### Limitations
- Limited world knowledge (stopped at mid‑2025 pretraining data).
- Short context window (2,048 tokens).
- Small size means it can make more factual mistakes than larger models.
## How to Get Started
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "OvercastLab/Quark-50m-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
messages = [
{"role": "system", "content": "You are Quark, a helpful assistant."},
{"role": "user", "content": "Explain group query attention in one sentence."}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |