metadata
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- chat
- instruct
- small-model
- 135m
- quark
Quark‑135M is a 135M parameter conversational AI assistant, trained from scratch and then fine‑tuned to be helpful, respectful, honest and to remember a clear identity.
- Base model: Quark‑135M (pretrained on 15 B tokens of general‑purpose and mathematical text)
- Instruction tuning: supervised fine‑tuning on a small, curated dataset of identity‑aware conversations
- Developers: OvercastLab and ThingsAI
- License: Apache‑2.0
Model Architecture
The model follows a Llama‑style decoder‑only transformer (similar to SmolLM) with the following components:
| Component | Value |
|---|---|
| Vocab size | 49 152 |
Hidden size (d_model) |
576 |
| Number of layers | 30 |
| Attention heads | 9 |
| KV heads (GQA) | 3 |
| Head dim | 64 |
| FFN dimension | 1 536 |
| Activation | SwiGLU |
| Normalization | RMSNorm |
| Positional encoding | Rotary Embeddings (RoPE, θ=10 000) |
| Max sequence length | 2 048 |
| Weight tying | Embedding / LM head |
Total trainable parameters: ~135 M
Evaluation Results
The table below reports zero‑shot performance on several common benchmarks, evaluated using lm‑eval‑harness with apply_chat_template=True. All scores are shown as percentages.
| Benchmark | Metric | Score |
|---|---|---|
| HellaSwag | acc_norm | 31.37% |
| ARC-Easy | acc_norm | 41.46% |
| ARC-Challenge | acc_norm | 25.09% |
| PIQA | acc_norm | 61.26% |
| MMLU (avg) | acc | 23.17% |
| MMLU Humanities | acc | 24.23% |
| MMLU Social Sciences | acc | 22.59% |
| MMLU STEM | acc | 22.04% |
| MMLU Other | acc | 23.27% |
| CommonsenseQA | acc | 20.56% |
| OpenBookQA | acc_norm | 27.20% |
| Winogrande | acc | 50.20% |
| TriviaQA | exact_match | 0.07% |
Key takeaways:
- HellaSwag (31.37%) is above random chance (25%) but far below models pre‑trained on hundreds of billions of tokens. This reflects the modest 15 B token pre‑training budget.
- PIQA (61.26%) shows the model has basic physical reasoning, benefiting from the pre‑training mix.
- TriviaQA (0.07%) confirms the model has almost no factual recall – it was not exposed to a large enough knowledge corpus.
- MMLU (23.17%) is near random for a 4‑option task, indicating very limited academic knowledge.
Intended Use
Quark‑135M‑Instruct is a small conversational assistant that excels at:
- Polite, identity‑aware small talk
- Refusing gracefully when it doesn’t know something
- Following simple instructions (e.g., greetings, name recall, basic Q&A)
It is not suitable for tasks requiring factual accuracy, deep reasoning, or reliable knowledge retrieval.
Limitations
- Small model size – 135M parameters are an order of magnitude smaller than current frontier models.
- Limited world knowledge – pre‑trained on only 15 B tokens; it lacks the broad coverage of larger models.
- Hallucinates frequently – when asked questions beyond simple greetings or self‑description, it may invent plausible‑sounding but incorrect answers.
- Repetitive loops – may occasionally repeat phrases or get stuck in loops, especially with low temperature sampling.
- Instruction coverage – fine‑tuned on only 1 500 identity examples; it may not handle out‑of‑domain requests gracefully.
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "OvercastLab/Quark-135m-Instruct" # (replace with actual HF repo)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto"
)
messages = [
{"role": "system", "content": "You are Quark, a helpful, respectful and honest AI assistant created by OvercastLab and ThingsAI together with Mich. Always answer as helpfully and accurately as possible."},
{"role": "user", "content": "Hi, what's your name?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output_ids = model.generate(
**inputs,
max_new_tokens=150,
do_sample=True,
temperature=0.2,
top_k=50,
top_p=0.95,
repetition_penalty=1.3,
eos_token_id=tokenizer.convert_tokens_to_ids(["<|user|>", "<|system|>"]) + [tokenizer.eos_token_id],
)
response = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)