Update README.md

be51c2e verified 10 days ago

3.31 kB

language:
  - en
  - code
license: apache-2.0
tags:
  - smol
  - pretraining
  - instruct
  - 50M
  - causal-lm
  - gqa
  - swiglu
  - rmsnorm
datasets:
  - HuggingFaceTB/smollm-corpus
metrics:
  - perplexity
model-index:
  - name: Quark-50m-Instruct
    results: []
pipeline_tag: text-generation

Quark-50m-Instruct

Quark-50m-Instruct is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following. It is built on the same architecture of “SmolLM” family and was fully pretrained on 5 billion tokens from HuggingFaceTB/smollm‑corpus.

Model type: Causal Language Model (LLaMA‑style decoder)
Architecture: GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying
Pretraining tokens: 5 B
Fine‑tuning: Instruction‑tuned (details below)
Creators: OvercastLab (research & development lab for ML/AI)
Release date: 22 April 2026

Model Summary

Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM) and even on CPU for light workloads. It is not competitive with large models on knowledge‑intensive tasks, but it excels at:

Simple conversational tasks
Code generation and explanation (Python)
Short text rewriting and summarisation
On‑device / edge inference

The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM:

Component	Details
Vocab size	49,152
Hidden size	384
Layers	24
Attention	Grouped Query (6 Q heads, 2 KV heads)
FFN	SwiGLU with 1,024 intermediate
Position	RoPE (θ = 10,000)
Normalisation	RMSNorm (pre‑block)

Total trainable parameters: ≈48 M (with weight tying).

Uses

Direct Use

The model can be used via the 🤗 Transformers library for standard text generation. It expects chat‑formatted input (see example below).

Downstream Use

Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer.

Limitations

Limited world knowledge (stopped at mid‑2025 pretraining data).
Short context window (2,048 tokens).
Small size means it can make more factual mistakes than larger models.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "OvercastLab/Quark-50m-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "system", "content": "You are Quark, a helpful assistant."},
    {"role": "user", "content": "Explain group query attention in one sentence."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))