AMFORGE
/

cofos_v2

Model card Files Files and versions

cofos_v2 / README.md

ameforge's picture

Create README.md

375d66b verified 4 days ago

|

history blame contribute delete

1.35 kB

	---
	license: bsl-1.0
	---

	# Cofos Code Model ({MODEL_VERSION}) — SparseMind 500M

	Cofos v2 is a 500M-parameter code model built on AMFORGE's SparseMind v15
	architecture. Same essence as Cofos v1 (296M @ 34% real_syntax_valid),
	scaled larger and trained with multilingual instructions + chain-of-thought.

	Developed by {ORGANIZATION}.

	## Architecture (SparseMind v15)


	## Parameters
	- `dim={cfg.dim}` (v1: 768), `n_layers={cfg.n_layers}`, `n_heads={cfg.n_heads}`
	(`head_dim={cfg.dim // cfg.n_heads}` — same as v1)
	- `max_seq_len={cfg.max_seq_len}` (v1: 512), `vocab_size={cfg.vocab_size}`
	- `channel_top_k={cfg.channel_top_k}`, `token_top_k={cfg.token_top_k}`
	(same sparsity ratios as v1)
	- Total parameters: {model.n_params:,}

	## Training data (3-way mix)
	- 30% real HF Python (`iamtarun/python_code_instructions_18k_alpaca`)


	## Result
	- Best `real_syntax_valid`: {best_syntax:.1f}% on held-out real Python instructions

	## Tokenizer
	- v2 tokenizer at [{HF_TOK_REPO_ID}](https://huggingface.co/{HF_TOK_REPO_ID})


	## How to use
	```python
	import torch
	import sentencepiece as spm

	# Load checkpoint
	ckpt = torch.load("cofos_best.pt", map_location="cpu")
	cfg_dict = ckpt["config"]

	# Instantiate model architecture
	# model = SparseMind(Config(**cfg_dict))
	# model.load_state_dict(ckpt["model"])
	# model.eval()