---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- chat
- instruct
- small-model
- 135m
- quark
---


![logo](icon1.png)

Quark‑135M is a **135M parameter** conversational AI assistant, trained from scratch and then fine‑tuned to be **helpful, respectful, honest** and to remember a clear identity.

* **Base model:** Quark‑135M (pretrained on 15 B tokens of general‑purpose and mathematical text)
* **Instruction tuning:** supervised fine‑tuning on a small, curated dataset of identity‑aware conversations
* **Developers:** OvercastLab and ThingsAI
* **License:** Apache‑2.0

---

## Model Architecture

The model follows a **Llama‑style decoder‑only transformer** (similar to SmolLM) with the following components:

| Component         | Value                |
|-------------------|----------------------|
| Vocab size        | 49 152               |
| Hidden size (`d_model`) | 576              |
| Number of layers  | 30                   |
| Attention heads   | 9                    |
| KV heads (GQA)    | 3                    |
| Head dim          | 64                   |
| FFN dimension     | 1 536                |
| Activation        | SwiGLU               |
| Normalization     | RMSNorm              |
| Positional encoding| Rotary Embeddings (RoPE, θ=10 000) |
| Max sequence length | 2 048               |
| Weight tying      | Embedding / LM head |

**Total trainable parameters:** ~135 M

---

## Evaluation Results

The table below reports zero‑shot performance on several common benchmarks, evaluated using `lm‑eval‑harness` with `apply_chat_template=True`. All scores are shown as percentages.

| Benchmark           | Metric    | Score   |
|---------------------|-----------|--------:|
| **HellaSwag**       | acc_norm  | 31.37%  |
| **ARC-Easy**        | acc_norm  | 41.46%  |
| **ARC-Challenge**   | acc_norm  | 25.09%  |
| **PIQA**            | acc_norm  | 61.26%  |
| **MMLU** (avg)      | acc       | 23.17%  |
| MMLU Humanities     | acc       | 24.23%  |
| MMLU Social Sciences| acc       | 22.59%  |
| MMLU STEM           | acc       | 22.04%  |
| MMLU Other          | acc       | 23.27%  |
| **CommonsenseQA**   | acc       | 20.56%  |
| **OpenBookQA**      | acc_norm  | 27.20%  |
| **Winogrande**      | acc       | 50.20%  |
| **TriviaQA**        | exact_match | 0.07% |

**Key takeaways:**

* **HellaSwag (31.37%)** is above random chance (25%) but far below models pre‑trained on hundreds of billions of tokens. This reflects the modest 15 B token pre‑training budget.
* **PIQA (61.26%)** shows the model has basic physical reasoning, benefiting from the pre‑training mix.
* **TriviaQA (0.07%)** confirms the model has **almost no factual recall** – it was not exposed to a large enough knowledge corpus.
* **MMLU (23.17%)** is near random for a 4‑option task, indicating very limited academic knowledge.

---

## Intended Use

Quark‑135M‑Instruct is a **small conversational assistant** that excels at:

- Polite, identity‑aware small talk
- Refusing gracefully when it doesn’t know something
- Following simple instructions (e.g., greetings, name recall, basic Q&A)

It is **not suitable** for tasks requiring factual accuracy, deep reasoning, or reliable knowledge retrieval.

---

## Limitations

* **Small model size** – 135M parameters are an order of magnitude smaller than current frontier models.
* **Limited world knowledge** – pre‑trained on only 15 B tokens; it lacks the broad coverage of larger models.
* **Hallucinates frequently** – when asked questions beyond simple greetings or self‑description, it may invent plausible‑sounding but incorrect answers.
* **Repetitive loops** – may occasionally repeat phrases or get stuck in loops, especially with low temperature sampling.
* **Instruction coverage** – fine‑tuned on only 1 500 identity examples; it may not handle out‑of‑domain requests gracefully.

---

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "OvercastLab/Quark-135m-Instruct"   # (replace with actual HF repo)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are Quark, a helpful, respectful and honest AI assistant created by OvercastLab and ThingsAI together with Mich. Always answer as helpfully and accurately as possible."},
    {"role": "user", "content": "Hi, what's your name?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output_ids = model.generate(
    **inputs,
    max_new_tokens=150,
    do_sample=True,
    temperature=0.2,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.3,
    eos_token_id=tokenizer.convert_tokens_to_ids(["<|user|>", "<|system|>"]) + [tokenizer.eos_token_id],
)
response = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)