File size: 5,107 Bytes
1fc785a a68f6ad 1fc785a 07947ea 1fc785a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | ---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- chat
- instruct
- small-model
- 135m
- quark
---
Quark‑135M is a **135M parameter** conversational AI assistant, trained from scratch and then fine‑tuned to be **helpful, respectful, honest** and to remember a clear identity.
* **Base model:** Quark‑135M (pretrained on 15 B tokens of general‑purpose and mathematical text)
* **Instruction tuning:** supervised fine‑tuning on a small, curated dataset of identity‑aware conversations
* **Developers:** OvercastLab and ThingsAI
* **License:** Apache‑2.0
---
## Model Architecture
The model follows a **Llama‑style decoder‑only transformer** (similar to SmolLM) with the following components:
| Component | Value |
|-------------------|----------------------|
| Vocab size | 49 152 |
| Hidden size (`d_model`) | 576 |
| Number of layers | 30 |
| Attention heads | 9 |
| KV heads (GQA) | 3 |
| Head dim | 64 |
| FFN dimension | 1 536 |
| Activation | SwiGLU |
| Normalization | RMSNorm |
| Positional encoding| Rotary Embeddings (RoPE, θ=10 000) |
| Max sequence length | 2 048 |
| Weight tying | Embedding / LM head |
**Total trainable parameters:** ~135 M
---
## Evaluation Results
The table below reports zero‑shot performance on several common benchmarks, evaluated using `lm‑eval‑harness` with `apply_chat_template=True`. All scores are shown as percentages.
| Benchmark | Metric | Score |
|---------------------|-----------|--------:|
| **HellaSwag** | acc_norm | 31.37% |
| **ARC-Easy** | acc_norm | 41.46% |
| **ARC-Challenge** | acc_norm | 25.09% |
| **PIQA** | acc_norm | 61.26% |
| **MMLU** (avg) | acc | 23.17% |
| MMLU Humanities | acc | 24.23% |
| MMLU Social Sciences| acc | 22.59% |
| MMLU STEM | acc | 22.04% |
| MMLU Other | acc | 23.27% |
| **CommonsenseQA** | acc | 20.56% |
| **OpenBookQA** | acc_norm | 27.20% |
| **Winogrande** | acc | 50.20% |
| **TriviaQA** | exact_match | 0.07% |
**Key takeaways:**
* **HellaSwag (31.37%)** is above random chance (25%) but far below models pre‑trained on hundreds of billions of tokens. This reflects the modest 15 B token pre‑training budget.
* **PIQA (61.26%)** shows the model has basic physical reasoning, benefiting from the pre‑training mix.
* **TriviaQA (0.07%)** confirms the model has **almost no factual recall** – it was not exposed to a large enough knowledge corpus.
* **MMLU (23.17%)** is near random for a 4‑option task, indicating very limited academic knowledge.
---
## Intended Use
Quark‑135M‑Instruct is a **small conversational assistant** that excels at:
- Polite, identity‑aware small talk
- Refusing gracefully when it doesn’t know something
- Following simple instructions (e.g., greetings, name recall, basic Q&A)
It is **not suitable** for tasks requiring factual accuracy, deep reasoning, or reliable knowledge retrieval.
---
## Limitations
* **Small model size** – 135M parameters are an order of magnitude smaller than current frontier models.
* **Limited world knowledge** – pre‑trained on only 15 B tokens; it lacks the broad coverage of larger models.
* **Hallucinates frequently** – when asked questions beyond simple greetings or self‑description, it may invent plausible‑sounding but incorrect answers.
* **Repetitive loops** – may occasionally repeat phrases or get stuck in loops, especially with low temperature sampling.
* **Instruction coverage** – fine‑tuned on only 1 500 identity examples; it may not handle out‑of‑domain requests gracefully.
---
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "OvercastLab/Quark-135m-Instruct" # (replace with actual HF repo)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto"
)
messages = [
{"role": "system", "content": "You are Quark, a helpful, respectful and honest AI assistant created by OvercastLab and ThingsAI together with Mich. Always answer as helpfully and accurately as possible."},
{"role": "user", "content": "Hi, what's your name?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output_ids = model.generate(
**inputs,
max_new_tokens=150,
do_sample=True,
temperature=0.2,
top_k=50,
top_p=0.95,
repetition_penalty=1.3,
eos_token_id=tokenizer.convert_tokens_to_ids(["<|user|>", "<|system|>"]) + [tokenizer.eos_token_id],
)
response = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response) |