Scantrack
/

Agora

Text Generation

Model card Files Files and versions

Agora / README.md

Scantrack's picture

Kiss haha

7a44667 verified 25 days ago

|

History Blame Contribute Delete

3.49 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- agora
	- causal-lm
	- transformer
	- gqa
	- rope
	library_name: transformers
	pipeline_tag: text-generation
	---

	# Agora

	Agora is a compact decoder-only language model built on a modern transformer architecture. It uses Grouped Query Attention (GQA), Rotary Position Embeddings (RoPE), SwiGLU activations, and RMSNorm throughout — combining design decisions from LLaMA, Mistral, and Gemma into a clean, efficient baseline.

	## Architecture

	\| Parameter \| Value \|
	\|-------------------------\|--------------\|
	\| Hidden size \| 2048 \|
	\| Intermediate size \| 8192 \|
	\| Layers \| 24 \|
	\| Attention heads \| 16 \|
	\| KV heads (GQA) \| 8 \|
	\| Head dimension \| 128 \|
	\| Max sequence length \| 4096 \|
	\| Vocabulary size \| 32 000 \|
	\| Activation \| SiLU (SwiGLU gate) \|
	\| Positional encoding \| RoPE (θ = 10 000) \|
	\| Normalisation \| RMSNorm (ε = 1e-5) \|
	\| Precision \| bfloat16 \|

	Total parameters: ~1.3 B (estimate; depends on weight tying).

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "Scantrack/Agora"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	)

	prompt = "The key to building efficient language models is"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	output = model.generate(
	**inputs,
	max_new_tokens=200,
	do_sample=True,
	temperature=0.8,
	top_p=0.95,
	repetition_penalty=1.1,
	)

	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	> Note: Pass `trust_remote_code=True` because the config and model classes are custom (`configuration_agora.py`, `modeling_agora.py`).

	## Design Decisions

	GQA (8 KV heads, 16 query heads) — halves the KV cache size versus MHA while keeping full expressiveness on the query side. Reduces memory bandwidth bottleneck during inference at 2× the batch sizes.

	RoPE — relative position information is injected directly into attention scores without learned position embeddings, making the model more naturally extensible to longer contexts.

	SwiGLU — the gated variant of SiLU (gate_proj × up_proj → down_proj) outperforms standard FFN layers on most benchmarks at equivalent parameter count.

	RMSNorm — faster than LayerNorm (no mean subtraction), numerically stable, and standard in modern LLMs.

	bfloat16 — preferred over fp16 for training stability (larger dynamic range); inference runs cleanly on any Ampere+ GPU or modern CPU with bfloat16 support.

	## Tokenizer

	Agora uses the LLaMA tokenizer (SentencePiece, BPE, 32 000 vocab). You can swap in any compatible SentencePiece model by replacing `tokenizer.model` and updating `tokenizer_config.json`.

	## Training

	(Fill in once training is complete.)

	- Dataset:
	- Training compute:
	- Optimizer:
	- Learning rate schedule:
	- Final loss:

	## Limitations

	This is a research/prototype release. The model card will be updated after pretraining completes with evaluation results on standard benchmarks (HellaSwag, MMLU, ARC, TruthfulQA, etc.).

	## License

	Apache 2.0 — see `LICENSE`.