README.md · OvercastLab/Quark-50m at main

Quark-50m / README.md

blackbook-lm

Update README.md

be51c2e verified 11 days ago

preview code

raw

history blame contribute delete

3.31 kB

	---
	language:
	- en
	- code
	license: apache-2.0
	tags:
	- smol
	- pretraining
	- instruct
	- 50M
	- causal-lm
	- gqa
	- swiglu
	- rmsnorm
	datasets:
	- HuggingFaceTB/smollm-corpus
	metrics:
	- perplexity
	model-index:
	- name: Quark-50m-Instruct
	results: []
	pipeline_tag: text-generation
	---

	# Quark-50m-Instruct

	Quark-50m-Instruct is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following.
	It is built on the same architecture of “SmolLM” family and was fully pretrained on 5 billion tokens from
	[HuggingFaceTB/smollm‑corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus).

	- Model type: Causal Language Model (LLaMA‑style decoder)
	- Architecture: GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying
	- Pretraining tokens: 5 B
	- Fine‑tuning: Instruction‑tuned (details below)
	- Creators: [OvercastLab](https://huggingface.co/OvercastLab) (research & development lab for ML/AI)
	- Release date: 22 April 2026

	## Model Summary

	Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM)
	and even on CPU for light workloads. It is not competitive with large models on knowledge‑intensive tasks,
	but it excels at:

	- Simple conversational tasks
	- Code generation and explanation (Python)
	- Short text rewriting and summarisation
	- On‑device / edge inference

	The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM:

	\| Component \| Details \|
	\|-------------\|-------------------------------\|
	\| Vocab size \| 49,152 \|
	\| Hidden size \| 384 \|
	\| Layers \| 24 \|
	\| Attention \| Grouped Query (6 Q heads, 2 KV heads) \|
	\| FFN \| SwiGLU with 1,024 intermediate \|
	\| Position \| RoPE (θ = 10,000) \|
	\| Normalisation \| RMSNorm (pre‑block) \|

	Total trainable parameters: ≈48 M (with weight tying).

	## Uses

	### Direct Use
	The model can be used via the 🤗 Transformers library for standard text generation.
	It expects chat‑formatted input (see example below).

	### Downstream Use
	Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for
	domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer.

	### Limitations
	- Limited world knowledge (stopped at mid‑2025 pretraining data).
	- Short context window (2,048 tokens).
	- Small size means it can make more factual mistakes than larger models.

	## How to Get Started

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "OvercastLab/Quark-50m-Instruct"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

	messages = [
	{"role": "system", "content": "You are Quark, a helpful assistant."},
	{"role": "user", "content": "Explain group query attention in one sentence."}
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	outputs = model.generate(inputs, max_new_tokens=128)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))