File size: 5,107 Bytes
1fc785a
 
 
 
 
 
 
 
 
 
 
 
 
 
a68f6ad
1fc785a
07947ea
1fc785a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- chat
- instruct
- small-model
- 135m
- quark
---




Quark‑135M is a **135M parameter** conversational AI assistant, trained from scratch and then fine‑tuned to be **helpful, respectful, honest** and to remember a clear identity.

* **Base model:** Quark‑135M (pretrained on 15 B tokens of general‑purpose and mathematical text)
* **Instruction tuning:** supervised fine‑tuning on a small, curated dataset of identity‑aware conversations
* **Developers:** OvercastLab and ThingsAI
* **License:** Apache‑2.0

---

## Model Architecture

The model follows a **Llama‑style decoder‑only transformer** (similar to SmolLM) with the following components:

| Component         | Value                |
|-------------------|----------------------|
| Vocab size        | 49 152               |
| Hidden size (`d_model`) | 576              |
| Number of layers  | 30                   |
| Attention heads   | 9                    |
| KV heads (GQA)    | 3                    |
| Head dim          | 64                   |
| FFN dimension     | 1 536                |
| Activation        | SwiGLU               |
| Normalization     | RMSNorm              |
| Positional encoding| Rotary Embeddings (RoPE, θ=10 000) |
| Max sequence length | 2 048               |
| Weight tying      | Embedding / LM head |

**Total trainable parameters:** ~135 M

---

## Evaluation Results

The table below reports zero‑shot performance on several common benchmarks, evaluated using `lm‑eval‑harness` with `apply_chat_template=True`. All scores are shown as percentages.

| Benchmark           | Metric    | Score   |
|---------------------|-----------|--------:|
| **HellaSwag**       | acc_norm  | 31.37%  |
| **ARC-Easy**        | acc_norm  | 41.46%  |
| **ARC-Challenge**   | acc_norm  | 25.09%  |
| **PIQA**            | acc_norm  | 61.26%  |
| **MMLU** (avg)      | acc       | 23.17%  |
| MMLU Humanities     | acc       | 24.23%  |
| MMLU Social Sciences| acc       | 22.59%  |
| MMLU STEM           | acc       | 22.04%  |
| MMLU Other          | acc       | 23.27%  |
| **CommonsenseQA**   | acc       | 20.56%  |
| **OpenBookQA**      | acc_norm  | 27.20%  |
| **Winogrande**      | acc       | 50.20%  |
| **TriviaQA**        | exact_match | 0.07% |

**Key takeaways:**

* **HellaSwag (31.37%)** is above random chance (25%) but far below models pre‑trained on hundreds of billions of tokens. This reflects the modest 15 B token pre‑training budget.
* **PIQA (61.26%)** shows the model has basic physical reasoning, benefiting from the pre‑training mix.
* **TriviaQA (0.07%)** confirms the model has **almost no factual recall** – it was not exposed to a large enough knowledge corpus.
* **MMLU (23.17%)** is near random for a 4‑option task, indicating very limited academic knowledge.

---

## Intended Use

Quark‑135M‑Instruct is a **small conversational assistant** that excels at:

- Polite, identity‑aware small talk
- Refusing gracefully when it doesn’t know something
- Following simple instructions (e.g., greetings, name recall, basic Q&A)

It is **not suitable** for tasks requiring factual accuracy, deep reasoning, or reliable knowledge retrieval.

---

## Limitations

* **Small model size** – 135M parameters are an order of magnitude smaller than current frontier models.
* **Limited world knowledge** – pre‑trained on only 15 B tokens; it lacks the broad coverage of larger models.
* **Hallucinates frequently** – when asked questions beyond simple greetings or self‑description, it may invent plausible‑sounding but incorrect answers.
* **Repetitive loops** – may occasionally repeat phrases or get stuck in loops, especially with low temperature sampling.
* **Instruction coverage** – fine‑tuned on only 1 500 identity examples; it may not handle out‑of‑domain requests gracefully.

---

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "OvercastLab/Quark-135m-Instruct"   # (replace with actual HF repo)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are Quark, a helpful, respectful and honest AI assistant created by OvercastLab and ThingsAI together with Mich. Always answer as helpfully and accurately as possible."},
    {"role": "user", "content": "Hi, what's your name?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output_ids = model.generate(
    **inputs,
    max_new_tokens=150,
    do_sample=True,
    temperature=0.2,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.3,
    eos_token_id=tokenizer.convert_tokens_to_ids(["<|user|>", "<|system|>"]) + [tokenizer.eos_token_id],
)
response = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)