| --- |
| language: |
| - en |
| - code |
| license: apache-2.0 |
| tags: |
| - smol |
| - pretraining |
| - instruct |
| - 50M |
| - causal-lm |
| - gqa |
| - swiglu |
| - rmsnorm |
| datasets: |
| - HuggingFaceTB/smollm-corpus |
| metrics: |
| - perplexity |
| model-index: |
| - name: Quark-50m-Instruct |
| results: [] |
| pipeline_tag: text-generation |
| --- |
| |
| # Quark-50m-Instruct |
|
|
| **Quark-50m-Instruct** is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following. |
| It is built on the same architecture of “SmolLM” family and was fully pretrained on 5 billion tokens from |
| [HuggingFaceTB/smollm‑corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus). |
|
|
| - **Model type:** Causal Language Model (LLaMA‑style decoder) |
| - **Architecture:** GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying |
| - **Pretraining tokens:** 5 B |
| - **Fine‑tuning:** Instruction‑tuned (details below) |
| - **Creators:** [OvercastLab](https://huggingface.co/OvercastLab) (research & development lab for ML/AI) |
| - **Release date:** 22 April 2026 |
|
|
| ## Model Summary |
|
|
| Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM) |
| and even on CPU for light workloads. It is **not** competitive with large models on knowledge‑intensive tasks, |
| but it excels at: |
|
|
| - Simple conversational tasks |
| - Code generation and explanation (Python) |
| - Short text rewriting and summarisation |
| - On‑device / edge inference |
|
|
| The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM: |
|
|
| | Component | Details | |
| |-------------|-------------------------------| |
| | Vocab size | 49,152 | |
| | Hidden size | 384 | |
| | Layers | 24 | |
| | Attention | Grouped Query (6 Q heads, 2 KV heads) | |
| | FFN | SwiGLU with 1,024 intermediate | |
| | Position | RoPE (θ = 10,000) | |
| | Normalisation | RMSNorm (pre‑block) | |
|
|
| Total trainable parameters: **≈48 M** (with weight tying). |
|
|
| ## Uses |
|
|
| ### Direct Use |
| The model can be used via the 🤗 Transformers library for standard text generation. |
| It expects chat‑formatted input (see example below). |
|
|
| ### Downstream Use |
| Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for |
| domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer. |
|
|
| ### Limitations |
| - Limited world knowledge (stopped at mid‑2025 pretraining data). |
| - Short context window (2,048 tokens). |
| - Small size means it can make more factual mistakes than larger models. |
|
|
| ## How to Get Started |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| model_name = "OvercastLab/Quark-50m-Instruct" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") |
| |
| messages = [ |
| {"role": "system", "content": "You are Quark, a helpful assistant."}, |
| {"role": "user", "content": "Explain group query attention in one sentence."} |
| ] |
| |
| inputs = tokenizer.apply_chat_template( |
| messages, |
| tokenize=True, |
| add_generation_prompt=True, |
| return_tensors="pt" |
| ).to(model.device) |
| |
| outputs = model.generate(inputs, max_new_tokens=128) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |