--- language: - en - code license: apache-2.0 tags: - smol - pretraining - instruct - 50M - causal-lm - gqa - swiglu - rmsnorm datasets: - HuggingFaceTB/smollm-corpus metrics: - perplexity model-index: - name: Quark-50m-Instruct results: [] pipeline_tag: text-generation --- # Quark-50m-Instruct **Quark-50m-Instruct** is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following. It is built on the same architecture of “SmolLM” family and was fully pretrained on 5 billion tokens from [HuggingFaceTB/smollm‑corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus). - **Model type:** Causal Language Model (LLaMA‑style decoder) - **Architecture:** GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying - **Pretraining tokens:** 5 B - **Fine‑tuning:** Instruction‑tuned (details below) - **Creators:** [OvercastLab](https://huggingface.co/OvercastLab) (research & development lab for ML/AI) - **Release date:** 22 April 2026 ## Model Summary Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM) and even on CPU for light workloads. It is **not** competitive with large models on knowledge‑intensive tasks, but it excels at: - Simple conversational tasks - Code generation and explanation (Python) - Short text rewriting and summarisation - On‑device / edge inference The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM: | Component | Details | |-------------|-------------------------------| | Vocab size | 49,152 | | Hidden size | 384 | | Layers | 24 | | Attention | Grouped Query (6 Q heads, 2 KV heads) | | FFN | SwiGLU with 1,024 intermediate | | Position | RoPE (θ = 10,000) | | Normalisation | RMSNorm (pre‑block) | Total trainable parameters: **≈48 M** (with weight tying). ## Uses ### Direct Use The model can be used via the 🤗 Transformers library for standard text generation. It expects chat‑formatted input (see example below). ### Downstream Use Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer. ### Limitations - Limited world knowledge (stopped at mid‑2025 pretraining data). - Short context window (2,048 tokens). - Small size means it can make more factual mistakes than larger models. ## How to Get Started ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "OvercastLab/Quark-50m-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") messages = [ {"role": "system", "content": "You are Quark, a helpful assistant."}, {"role": "user", "content": "Explain group query attention in one sentence."} ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ).to(model.device) outputs = model.generate(inputs, max_new_tokens=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True))