Quark-50m / README.md
blackbook-lm's picture
Update README.md
be51c2e verified
metadata
language:
  - en
  - code
license: apache-2.0
tags:
  - smol
  - pretraining
  - instruct
  - 50M
  - causal-lm
  - gqa
  - swiglu
  - rmsnorm
datasets:
  - HuggingFaceTB/smollm-corpus
metrics:
  - perplexity
model-index:
  - name: Quark-50m-Instruct
    results: []
pipeline_tag: text-generation

Quark-50m-Instruct

Quark-50m-Instruct is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following. It is built on the same architecture of “SmolLM” family and was fully pretrained on 5 billion tokens from HuggingFaceTB/smollm‑corpus.

  • Model type: Causal Language Model (LLaMA‑style decoder)
  • Architecture: GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying
  • Pretraining tokens: 5 B
  • Fine‑tuning: Instruction‑tuned (details below)
  • Creators: OvercastLab (research & development lab for ML/AI)
  • Release date: 22 April 2026

Model Summary

Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM) and even on CPU for light workloads. It is not competitive with large models on knowledge‑intensive tasks, but it excels at:

  • Simple conversational tasks
  • Code generation and explanation (Python)
  • Short text rewriting and summarisation
  • On‑device / edge inference

The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM:

Component Details
Vocab size 49,152
Hidden size 384
Layers 24
Attention Grouped Query (6 Q heads, 2 KV heads)
FFN SwiGLU with 1,024 intermediate
Position RoPE (θ = 10,000)
Normalisation RMSNorm (pre‑block)

Total trainable parameters: ≈48 M (with weight tying).

Uses

Direct Use

The model can be used via the 🤗 Transformers library for standard text generation. It expects chat‑formatted input (see example below).

Downstream Use

Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer.

Limitations

  • Limited world knowledge (stopped at mid‑2025 pretraining data).
  • Short context window (2,048 tokens).
  • Small size means it can make more factual mistakes than larger models.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "OvercastLab/Quark-50m-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "system", "content": "You are Quark, a helpful assistant."},
    {"role": "user", "content": "Explain group query attention in one sentence."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))