--- license: apache-2.0 language: - en pipeline_tag: text-generation tags: - chat - instruct - small-model - 135m - quark --- ![logo](icon1.png) Quark‑135M is a **135M parameter** conversational AI assistant, trained from scratch and then fine‑tuned to be **helpful, respectful, honest** and to remember a clear identity. * **Base model:** Quark‑135M (pretrained on 15 B tokens of general‑purpose and mathematical text) * **Instruction tuning:** supervised fine‑tuning on a small, curated dataset of identity‑aware conversations * **Developers:** OvercastLab and ThingsAI * **License:** Apache‑2.0 --- ## Model Architecture The model follows a **Llama‑style decoder‑only transformer** (similar to SmolLM) with the following components: | Component | Value | |-------------------|----------------------| | Vocab size | 49 152 | | Hidden size (`d_model`) | 576 | | Number of layers | 30 | | Attention heads | 9 | | KV heads (GQA) | 3 | | Head dim | 64 | | FFN dimension | 1 536 | | Activation | SwiGLU | | Normalization | RMSNorm | | Positional encoding| Rotary Embeddings (RoPE, θ=10 000) | | Max sequence length | 2 048 | | Weight tying | Embedding / LM head | **Total trainable parameters:** ~135 M --- ## Evaluation Results The table below reports zero‑shot performance on several common benchmarks, evaluated using `lm‑eval‑harness` with `apply_chat_template=True`. All scores are shown as percentages. | Benchmark | Metric | Score | |---------------------|-----------|--------:| | **HellaSwag** | acc_norm | 31.37% | | **ARC-Easy** | acc_norm | 41.46% | | **ARC-Challenge** | acc_norm | 25.09% | | **PIQA** | acc_norm | 61.26% | | **MMLU** (avg) | acc | 23.17% | | MMLU Humanities | acc | 24.23% | | MMLU Social Sciences| acc | 22.59% | | MMLU STEM | acc | 22.04% | | MMLU Other | acc | 23.27% | | **CommonsenseQA** | acc | 20.56% | | **OpenBookQA** | acc_norm | 27.20% | | **Winogrande** | acc | 50.20% | | **TriviaQA** | exact_match | 0.07% | **Key takeaways:** * **HellaSwag (31.37%)** is above random chance (25%) but far below models pre‑trained on hundreds of billions of tokens. This reflects the modest 15 B token pre‑training budget. * **PIQA (61.26%)** shows the model has basic physical reasoning, benefiting from the pre‑training mix. * **TriviaQA (0.07%)** confirms the model has **almost no factual recall** – it was not exposed to a large enough knowledge corpus. * **MMLU (23.17%)** is near random for a 4‑option task, indicating very limited academic knowledge. --- ## Intended Use Quark‑135M‑Instruct is a **small conversational assistant** that excels at: - Polite, identity‑aware small talk - Refusing gracefully when it doesn’t know something - Following simple instructions (e.g., greetings, name recall, basic Q&A) It is **not suitable** for tasks requiring factual accuracy, deep reasoning, or reliable knowledge retrieval. --- ## Limitations * **Small model size** – 135M parameters are an order of magnitude smaller than current frontier models. * **Limited world knowledge** – pre‑trained on only 15 B tokens; it lacks the broad coverage of larger models. * **Hallucinates frequently** – when asked questions beyond simple greetings or self‑description, it may invent plausible‑sounding but incorrect answers. * **Repetitive loops** – may occasionally repeat phrases or get stuck in loops, especially with low temperature sampling. * **Instruction coverage** – fine‑tuned on only 1 500 identity examples; it may not handle out‑of‑domain requests gracefully. --- ## How to Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "OvercastLab/Quark-135m-Instruct" # (replace with actual HF repo) tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype="auto", device_map="auto" ) messages = [ {"role": "system", "content": "You are Quark, a helpful, respectful and honest AI assistant created by OvercastLab and ThingsAI together with Mich. Always answer as helpfully and accurately as possible."}, {"role": "user", "content": "Hi, what's your name?"} ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) output_ids = model.generate( **inputs, max_new_tokens=150, do_sample=True, temperature=0.2, top_k=50, top_p=0.95, repetition_penalty=1.3, eos_token_id=tokenizer.convert_tokens_to_ids(["<|user|>", "<|system|>"]) + [tokenizer.eos_token_id], ) response = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) print(response)