File size: 1,346 Bytes
375d66b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: bsl-1.0
---

# Cofos Code Model ({MODEL_VERSION}) — SparseMind 500M

**Cofos v2** is a 500M-parameter code model built on AMFORGE's **SparseMind v15**
architecture. Same essence as Cofos v1 (296M @ 34% real_syntax_valid),
scaled larger and trained with multilingual instructions + chain-of-thought.

Developed by **{ORGANIZATION}**.

## Architecture (SparseMind v15)


## Parameters
- `dim={cfg.dim}` (v1: 768), `n_layers={cfg.n_layers}`, `n_heads={cfg.n_heads}`
  (`head_dim={cfg.dim // cfg.n_heads}` — same as v1)
- `max_seq_len={cfg.max_seq_len}` (v1: 512), `vocab_size={cfg.vocab_size}`
- `channel_top_k={cfg.channel_top_k}`, `token_top_k={cfg.token_top_k}`
  (same sparsity ratios as v1)
- **Total parameters:** {model.n_params:,}

## Training data (3-way mix)
- **30% real HF Python** (`iamtarun/python_code_instructions_18k_alpaca`)


## Result
- **Best `real_syntax_valid`:** {best_syntax:.1f}% on held-out real Python instructions

## Tokenizer
- v2 tokenizer at [{HF_TOK_REPO_ID}](https://huggingface.co/{HF_TOK_REPO_ID})
  

## How to use
```python
import torch
import sentencepiece as spm

# Load checkpoint
ckpt = torch.load("cofos_best.pt", map_location="cpu")
cfg_dict = ckpt["config"]

# Instantiate model architecture
# model = SparseMind(Config(**cfg_dict))
# model.load_state_dict(ckpt["model"])
# model.eval()