Quark-135m / README.md

ThingsAI

Update README.md

fdbd82d verified 2 days ago

preview code

raw

history blame contribute delete

5.13 kB

metadata

license: apache-2.0
language:
  - en
pipeline_tag: text-generation
tags:
  - chat
  - instruct
  - small-model
  - 135m
  - quark

Quark‑135M is a 135M parameter conversational AI assistant, trained from scratch and then fine‑tuned to be helpful, respectful, honest and to remember a clear identity.

Base model: Quark‑135M (pretrained on 15 B tokens of general‑purpose and mathematical text)
Instruction tuning: supervised fine‑tuning on a small, curated dataset of identity‑aware conversations
Developers: OvercastLab and ThingsAI
License: Apache‑2.0

Model Architecture

The model follows a Llama‑style decoder‑only transformer (similar to SmolLM) with the following components:

Component	Value
Vocab size	49 152
Hidden size (`d_model`)	576
Number of layers	30
Attention heads	9
KV heads (GQA)	3
Head dim	64
FFN dimension	1 536
Activation	SwiGLU
Normalization	RMSNorm
Positional encoding	Rotary Embeddings (RoPE, θ=10 000)
Max sequence length	2 048
Weight tying	Embedding / LM head

Total trainable parameters: ~135 M

Evaluation Results

The table below reports zero‑shot performance on several common benchmarks, evaluated using lm‑eval‑harness with apply_chat_template=True. All scores are shown as percentages.

Benchmark	Metric	Score
HellaSwag	acc_norm	31.37%
ARC-Easy	acc_norm	41.46%
ARC-Challenge	acc_norm	25.09%
PIQA	acc_norm	61.26%
MMLU (avg)	acc	23.17%
MMLU Humanities	acc	24.23%
MMLU Social Sciences	acc	22.59%
MMLU STEM	acc	22.04%
MMLU Other	acc	23.27%
CommonsenseQA	acc	20.56%
OpenBookQA	acc_norm	27.20%
Winogrande	acc	50.20%
TriviaQA	exact_match	0.07%

Key takeaways:

HellaSwag (31.37%) is above random chance (25%) but far below models pre‑trained on hundreds of billions of tokens. This reflects the modest 15 B token pre‑training budget.
PIQA (61.26%) shows the model has basic physical reasoning, benefiting from the pre‑training mix.
TriviaQA (0.07%) confirms the model has almost no factual recall – it was not exposed to a large enough knowledge corpus.
MMLU (23.17%) is near random for a 4‑option task, indicating very limited academic knowledge.

Intended Use

Quark‑135M‑Instruct is a small conversational assistant that excels at:

Polite, identity‑aware small talk
Refusing gracefully when it doesn’t know something
Following simple instructions (e.g., greetings, name recall, basic Q&A)

It is not suitable for tasks requiring factual accuracy, deep reasoning, or reliable knowledge retrieval.

Limitations

Small model size – 135M parameters are an order of magnitude smaller than current frontier models.
Limited world knowledge – pre‑trained on only 15 B tokens; it lacks the broad coverage of larger models.
Hallucinates frequently – when asked questions beyond simple greetings or self‑description, it may invent plausible‑sounding but incorrect answers.
Repetitive loops – may occasionally repeat phrases or get stuck in loops, especially with low temperature sampling.
Instruction coverage – fine‑tuned on only 1 500 identity examples; it may not handle out‑of‑domain requests gracefully.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "OvercastLab/Quark-135m-Instruct"   # (replace with actual HF repo)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are Quark, a helpful, respectful and honest AI assistant created by OvercastLab and ThingsAI together with Mich. Always answer as helpfully and accurately as possible."},
    {"role": "user", "content": "Hi, what's your name?"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output_ids = model.generate(
    **inputs,
    max_new_tokens=150,
    do_sample=True,
    temperature=0.2,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.3,
    eos_token_id=tokenizer.convert_tokens_to_ids(["<|user|>", "<|system|>"]) + [tokenizer.eos_token_id],
)
response = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)