| --- |
| license: apache-2.0 |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - chat |
| - instruct |
| - small-model |
| - 135m |
| - quark |
| --- |
| |
|
|
|  |
|
|
| Quark‑135M is a **135M parameter** conversational AI assistant, trained from scratch and then fine‑tuned to be **helpful, respectful, honest** and to remember a clear identity. |
|
|
| * **Base model:** Quark‑135M (pretrained on 15 B tokens of general‑purpose and mathematical text) |
| * **Instruction tuning:** supervised fine‑tuning on a small, curated dataset of identity‑aware conversations |
| * **Developers:** OvercastLab and ThingsAI |
| * **License:** Apache‑2.0 |
|
|
| --- |
|
|
| ## Model Architecture |
|
|
| The model follows a **Llama‑style decoder‑only transformer** (similar to SmolLM) with the following components: |
|
|
| | Component | Value | |
| |-------------------|----------------------| |
| | Vocab size | 49 152 | |
| | Hidden size (`d_model`) | 576 | |
| | Number of layers | 30 | |
| | Attention heads | 9 | |
| | KV heads (GQA) | 3 | |
| | Head dim | 64 | |
| | FFN dimension | 1 536 | |
| | Activation | SwiGLU | |
| | Normalization | RMSNorm | |
| | Positional encoding| Rotary Embeddings (RoPE, θ=10 000) | |
| | Max sequence length | 2 048 | |
| | Weight tying | Embedding / LM head | |
|
|
| **Total trainable parameters:** ~135 M |
|
|
| --- |
|
|
| ## Evaluation Results |
|
|
| The table below reports zero‑shot performance on several common benchmarks, evaluated using `lm‑eval‑harness` with `apply_chat_template=True`. All scores are shown as percentages. |
|
|
| | Benchmark | Metric | Score | |
| |---------------------|-----------|--------:| |
| | **HellaSwag** | acc_norm | 31.37% | |
| | **ARC-Easy** | acc_norm | 41.46% | |
| | **ARC-Challenge** | acc_norm | 25.09% | |
| | **PIQA** | acc_norm | 61.26% | |
| | **MMLU** (avg) | acc | 23.17% | |
| | MMLU Humanities | acc | 24.23% | |
| | MMLU Social Sciences| acc | 22.59% | |
| | MMLU STEM | acc | 22.04% | |
| | MMLU Other | acc | 23.27% | |
| | **CommonsenseQA** | acc | 20.56% | |
| | **OpenBookQA** | acc_norm | 27.20% | |
| | **Winogrande** | acc | 50.20% | |
| | **TriviaQA** | exact_match | 0.07% | |
|
|
| **Key takeaways:** |
|
|
| * **HellaSwag (31.37%)** is above random chance (25%) but far below models pre‑trained on hundreds of billions of tokens. This reflects the modest 15 B token pre‑training budget. |
| * **PIQA (61.26%)** shows the model has basic physical reasoning, benefiting from the pre‑training mix. |
| * **TriviaQA (0.07%)** confirms the model has **almost no factual recall** – it was not exposed to a large enough knowledge corpus. |
| * **MMLU (23.17%)** is near random for a 4‑option task, indicating very limited academic knowledge. |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| Quark‑135M‑Instruct is a **small conversational assistant** that excels at: |
|
|
| - Polite, identity‑aware small talk |
| - Refusing gracefully when it doesn’t know something |
| - Following simple instructions (e.g., greetings, name recall, basic Q&A) |
|
|
| It is **not suitable** for tasks requiring factual accuracy, deep reasoning, or reliable knowledge retrieval. |
|
|
| --- |
|
|
| ## Limitations |
|
|
| * **Small model size** – 135M parameters are an order of magnitude smaller than current frontier models. |
| * **Limited world knowledge** – pre‑trained on only 15 B tokens; it lacks the broad coverage of larger models. |
| * **Hallucinates frequently** – when asked questions beyond simple greetings or self‑description, it may invent plausible‑sounding but incorrect answers. |
| * **Repetitive loops** – may occasionally repeat phrases or get stuck in loops, especially with low temperature sampling. |
| * **Instruction coverage** – fine‑tuned on only 1 500 identity examples; it may not handle out‑of‑domain requests gracefully. |
|
|
| --- |
|
|
| ## How to Use |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = "OvercastLab/Quark-135m-Instruct" # (replace with actual HF repo) |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype="auto", |
| device_map="auto" |
| ) |
| |
| messages = [ |
| {"role": "system", "content": "You are Quark, a helpful, respectful and honest AI assistant created by OvercastLab and ThingsAI together with Mich. Always answer as helpfully and accurately as possible."}, |
| {"role": "user", "content": "Hi, what's your name?"} |
| ] |
| |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| output_ids = model.generate( |
| **inputs, |
| max_new_tokens=150, |
| do_sample=True, |
| temperature=0.2, |
| top_k=50, |
| top_p=0.95, |
| repetition_penalty=1.3, |
| eos_token_id=tokenizer.convert_tokens_to_ids(["<|user|>", "<|system|>"]) + [tokenizer.eos_token_id], |
| ) |
| response = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) |
| print(response) |