Update README.md

fdbd82d verified 3 days ago

5.13 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- chat
	- instruct
	- small-model
	- 135m
	- quark
	---


	![logo](icon1.png)

	Quark‑135M is a 135M parameter conversational AI assistant, trained from scratch and then fine‑tuned to be helpful, respectful, honest and to remember a clear identity.

	* Base model: Quark‑135M (pretrained on 15 B tokens of general‑purpose and mathematical text)
	* Instruction tuning: supervised fine‑tuning on a small, curated dataset of identity‑aware conversations
	* Developers: OvercastLab and ThingsAI
	* License: Apache‑2.0

	---

	## Model Architecture

	The model follows a Llama‑style decoder‑only transformer (similar to SmolLM) with the following components:

	\| Component \| Value \|
	\|-------------------\|----------------------\|
	\| Vocab size \| 49 152 \|
	\| Hidden size (`d_model`) \| 576 \|
	\| Number of layers \| 30 \|
	\| Attention heads \| 9 \|
	\| KV heads (GQA) \| 3 \|
	\| Head dim \| 64 \|
	\| FFN dimension \| 1 536 \|
	\| Activation \| SwiGLU \|
	\| Normalization \| RMSNorm \|
	\| Positional encoding\| Rotary Embeddings (RoPE, θ=10 000) \|
	\| Max sequence length \| 2 048 \|
	\| Weight tying \| Embedding / LM head \|

	Total trainable parameters: ~135 M

	---

	## Evaluation Results

	The table below reports zero‑shot performance on several common benchmarks, evaluated using `lm‑eval‑harness` with `apply_chat_template=True`. All scores are shown as percentages.

	\| Benchmark \| Metric \| Score \|
	\|---------------------\|-----------\|--------:\|
	\| HellaSwag \| acc_norm \| 31.37% \|
	\| ARC-Easy \| acc_norm \| 41.46% \|
	\| ARC-Challenge \| acc_norm \| 25.09% \|
	\| PIQA \| acc_norm \| 61.26% \|
	\| MMLU (avg) \| acc \| 23.17% \|
	\| MMLU Humanities \| acc \| 24.23% \|
	\| MMLU Social Sciences\| acc \| 22.59% \|
	\| MMLU STEM \| acc \| 22.04% \|
	\| MMLU Other \| acc \| 23.27% \|
	\| CommonsenseQA \| acc \| 20.56% \|
	\| OpenBookQA \| acc_norm \| 27.20% \|
	\| Winogrande \| acc \| 50.20% \|
	\| TriviaQA \| exact_match \| 0.07% \|

	Key takeaways:

	* HellaSwag (31.37%) is above random chance (25%) but far below models pre‑trained on hundreds of billions of tokens. This reflects the modest 15 B token pre‑training budget.
	* PIQA (61.26%) shows the model has basic physical reasoning, benefiting from the pre‑training mix.
	* TriviaQA (0.07%) confirms the model has almost no factual recall – it was not exposed to a large enough knowledge corpus.
	* MMLU (23.17%) is near random for a 4‑option task, indicating very limited academic knowledge.

	---

	## Intended Use

	Quark‑135M‑Instruct is a small conversational assistant that excels at:

	- Polite, identity‑aware small talk
	- Refusing gracefully when it doesn’t know something
	- Following simple instructions (e.g., greetings, name recall, basic Q&A)

	It is not suitable for tasks requiring factual accuracy, deep reasoning, or reliable knowledge retrieval.

	---

	## Limitations

	* Small model size – 135M parameters are an order of magnitude smaller than current frontier models.
	* Limited world knowledge – pre‑trained on only 15 B tokens; it lacks the broad coverage of larger models.
	* Hallucinates frequently – when asked questions beyond simple greetings or self‑description, it may invent plausible‑sounding but incorrect answers.
	* Repetitive loops – may occasionally repeat phrases or get stuck in loops, especially with low temperature sampling.
	* Instruction coverage – fine‑tuned on only 1 500 identity examples; it may not handle out‑of‑domain requests gracefully.

	---

	## How to Use

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "OvercastLab/Quark-135m-Instruct" # (replace with actual HF repo)

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype="auto",
	device_map="auto"
	)

	messages = [
	{"role": "system", "content": "You are Quark, a helpful, respectful and honest AI assistant created by OvercastLab and ThingsAI together with Mich. Always answer as helpfully and accurately as possible."},
	{"role": "user", "content": "Hi, what's your name?"}
	]

	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	output_ids = model.generate(
	**inputs,
	max_new_tokens=150,
	do_sample=True,
	temperature=0.2,
	top_k=50,
	top_p=0.95,
	repetition_penalty=1.3,
	eos_token_id=tokenizer.convert_tokens_to_ids(["<\|user\|>", "<\|system\|>"]) + [tokenizer.eos_token_id],
	)
	response = tokenizer.decode(output_ids[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
	print(response)