shibatch commited on
Commit
ad0e5ec
·
verified ·
1 Parent(s): 364c6be

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -11,11 +11,13 @@ tags:
11
  - test-suite
12
  ---
13
 
14
- # TinyStories Mixtral 2M Top-2 MoE (tinymoe2m) GGUF & HF Validation Suite
15
 
16
  This repository provides an ultra-lightweight Mixtral model variant (a Mixture-of-Experts architecture utilizing the Llama 2 compute topology) scaled down to a **1.95M total parameter footprint** and a **1.14M active parameter execution frame**. It is trained on the TinyStories dataset and optimized as a precise validation asset.
17
 
18
- It is designed specifically for debugging custom inference engines, and native tensor compilers against MoE-specific runtime features. These include Gating network weight allocation, token distribution/gathering (Scatter/Gather loops), and the weighted addition combining multiple independent expert outputs.
 
 
19
 
20
  ---
21
 
@@ -31,6 +33,8 @@ To help track feature coverage across the 1M/2M verification suite, the core str
31
  | **Total Experts** | 1 (Non-MoE) | 1 (Non-MoE) | 1 (Non-MoE) | **4 Experts** |
32
  | **Selected Experts** | - | - | - | **Top-2 Experts** |
33
  | **Expert FFN Dim (`intermediate_size`)** | 564 | 352 | 352 | **352** (Shared across all experts) |
 
 
34
  | **Total Parameters** | ~1.2M | ~1.0M | ~1.0M | **~1.95M (1.95M Total)** |
35
  | **Active Parameters** | ~1.2M | ~1.0M | ~1.0M | **~1.14M (1.14M Active)** |
36
  | **Primary Debug Target** | Core matrix mult & layout | `byte_fallback` decode | Gemma 2 advanced graph | **Dynamic Routing & Scatter/Gather** |
@@ -61,14 +65,28 @@ Binary files optimized for execution via `llama.cpp` or compatible lower-level i
61
  ### 2. Hugging Face Native Format (`./hf/`)
62
  Unquantized components formatted for direct instantiation inside the PyTorch `transformers` library ecosystem:
63
  * **`hf/model.safetensors`**: Raw unquantized matrix parameters containing all 4 expert sub-networks alongside the master router tensor.
64
- * **`hf/config.json`**: Architectural specifications built around `MixtralConfig` criteria (layer depth, head maps, absolute expert counts, and top-k selection targets).
65
  * **`hf/generation_config.json`**: Standard generation defaults.
66
  * **`hf/tokenizer.model`**: The custom 512-vocabulary size SentencePiece BPE master binary.
67
- * **`hf/tokenizer_config.json`**: Metadata linking `LlamaTokenizer` classes to guarantee correct handling of prefix spacing and manage automatic `<s>` (BOS) injection properly on the Hugging Face backend.
68
  * **`hf/special_tokens_map.json`**: Structural map linking token strings (`<s>`=1, `</s>`=2) back to internal index bounds.
69
 
70
  ---
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ## 🚀 Usage Examples
73
 
74
  ### A. Running GGUF via llama.cpp
@@ -128,7 +146,8 @@ print("Generated:", generated_text)
128
  * **Number of Hidden Layers (`num_hidden_layers`):** 3
129
  * **Number of Attention Heads (`num_heads` / `num_kv_heads`):** 2 / 2 *(MHA layout)*
130
  * **Individual Expert Internal Dimension (`intermediate_size`):** 352 *(SwiGLU structure)*
131
- * **Max Position Embeddings (`max_position_embeddings`):** 256
 
132
 
133
  ## 📜 License
134
 
 
11
  - test-suite
12
  ---
13
 
14
+ # TinyStories Mixtral 2M Top-2 MoE (tinymoe2m) GGUF & HF Validation Suite (4k Context)
15
 
16
  This repository provides an ultra-lightweight Mixtral model variant (a Mixture-of-Experts architecture utilizing the Llama 2 compute topology) scaled down to a **1.95M total parameter footprint** and a **1.14M active parameter execution frame**. It is trained on the TinyStories dataset and optimized as a precise validation asset.
17
 
18
+ Following extensive long-context scaling evaluations, this asset has been calibrated to a **4,096 token context window (4k)** with an adjusted **RoPE base frequency (`rope_theta`) of 15,000.0** to prevent numerical saturation under FP32 precision boundaries while maintaining sharp localized attention coordinates.
19
+
20
+ It is designed specifically for debugging custom inference engines (such as `vulformer`), and native tensor compilers against MoE-specific runtime features. These include Gating network weight allocation, token distribution/gathering (Scatter/Gather loops), and the weighted addition combining multiple independent expert outputs.
21
 
22
  ---
23
 
 
33
  | **Total Experts** | 1 (Non-MoE) | 1 (Non-MoE) | 1 (Non-MoE) | **4 Experts** |
34
  | **Selected Experts** | - | - | - | **Top-2 Experts** |
35
  | **Expert FFN Dim (`intermediate_size`)** | 564 | 352 | 352 | **352** (Shared across all experts) |
36
+ | **Max Position Embeddings** | - | - | - | **4,096** |
37
+ | **RoPE Base (`rope_theta`)** | - | - | - | **15,000.0** |
38
  | **Total Parameters** | ~1.2M | ~1.0M | ~1.0M | **~1.95M (1.95M Total)** |
39
  | **Active Parameters** | ~1.2M | ~1.0M | ~1.0M | **~1.14M (1.14M Active)** |
40
  | **Primary Debug Target** | Core matrix mult & layout | `byte_fallback` decode | Gemma 2 advanced graph | **Dynamic Routing & Scatter/Gather** |
 
65
  ### 2. Hugging Face Native Format (`./hf/`)
66
  Unquantized components formatted for direct instantiation inside the PyTorch `transformers` library ecosystem:
67
  * **`hf/model.safetensors`**: Raw unquantized matrix parameters containing all 4 expert sub-networks alongside the master router tensor.
68
+ * **`hf/config.json`**: Architectural specifications built around `MixtralConfig` criteria (layer depth, head maps, absolute expert counts, and top-k selection targets). Fully updated to enforce `max_position_embeddings: 4096` and `rope_theta: 15000.0`.
69
  * **`hf/generation_config.json`**: Standard generation defaults.
70
  * **`hf/tokenizer.model`**: The custom 512-vocabulary size SentencePiece BPE master binary.
71
+ * **`hf/tokenizer_config.json`**: Metadata linking `LlamaTokenizer` classes to guarantee correct handling of prefix spacing and manage automatic `<s>` (BOS) injection properly on the Hugging Face backend. Configured with `model_max_length: 4096`.
72
  * **`hf/special_tokens_map.json`**: Structural map linking token strings (`<s>`=1, `</s>`=2) back to internal index bounds.
73
 
74
  ---
75
 
76
+ ## 🎯 Purpose & Design Philosophy (Verification Targets)
77
+
78
+ This checkpoint is specifically engineered as a deterministic validation test asset for computing platforms and **is not designed for long-context semantic extraction tasks (such as Needle-in-a-Haystack password retrieval).**
79
+
80
+ Due to the extreme capacity boundaries (~1.95M total parameters) and ultra-compact vocabulary layout (512 tokens), the internal network matrices allocate their expressiveness exclusively toward mastering English syntax and high-frequency phrases. It lacks the multi-layer, high-order dynamic copy induction circuits required to trace out-of-context injection strings or narrow characters across large windows.
81
+
82
+ ### Expected Token Output Behavior
83
+ When processed with template phrases containing temporary password identifiers like:
84
+ `"The magic password of the giant was key X. I remember that the magic password of the giant was"`
85
+
86
+ The network will cleanly bypass copying the literal character `X` and instead continue generating standard learned unigram-biased blocks such as `"about to go home. Every day..."`. This is mathematically expected behavior. Validation is achieved strictly via **Bit-Exact Logit Verification** across runtime backends to confirm matching compute kernels, KV cache memory indices, causal attention layers, and precise RoPE phase calculation.
87
+
88
+ ---
89
+
90
  ## 🚀 Usage Examples
91
 
92
  ### A. Running GGUF via llama.cpp
 
146
  * **Number of Hidden Layers (`num_hidden_layers`):** 3
147
  * **Number of Attention Heads (`num_heads` / `num_kv_heads`):** 2 / 2 *(MHA layout)*
148
  * **Individual Expert Internal Dimension (`intermediate_size`):** 352 *(SwiGLU structure)*
149
+ * **Max Position Embeddings (`max_position_embeddings`):** 4,096
150
+ * **RoPE Base Frequency (`rope_theta`):** 15,000.0
151
 
152
  ## 📜 License
153
 
hf/config.json CHANGED
@@ -11,7 +11,7 @@
11
  "hidden_size": 128,
12
  "initializer_range": 0.02,
13
  "intermediate_size": 352,
14
- "max_position_embeddings": 256,
15
  "model_type": "mixtral",
16
  "num_attention_heads": 2,
17
  "num_experts_per_tok": 2,
@@ -22,7 +22,7 @@
22
  "pad_token_id": 2,
23
  "rms_norm_eps": 1e-05,
24
  "rope_parameters": {
25
- "rope_theta": 1000000.0,
26
  "rope_type": "default"
27
  },
28
  "router_aux_loss_coef": 0.001,
 
11
  "hidden_size": 128,
12
  "initializer_range": 0.02,
13
  "intermediate_size": 352,
14
+ "max_position_embeddings": 4096,
15
  "model_type": "mixtral",
16
  "num_attention_heads": 2,
17
  "num_experts_per_tok": 2,
 
22
  "pad_token_id": 2,
23
  "rms_norm_eps": 1e-05,
24
  "rope_parameters": {
25
+ "rope_theta": 15000.0,
26
  "rope_type": "default"
27
  },
28
  "router_aux_loss_coef": 0.001,
hf/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:639f23e32fe1d96d3c3608b707446efec64487051b806b761d09dc3950896134
3
  size 7815432
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c67371ef508c3f071a4dc598fe263aa3058c02af631bd5c2dfedb1dae2387ee3
3
  size 7815432
hf/special_tokens_map.json CHANGED
@@ -3,4 +3,4 @@
3
  "eos_token": "</s>",
4
  "pad_token": "</s>",
5
  "unk_token": "<unk>"
6
- }
 
3
  "eos_token": "</s>",
4
  "pad_token": "</s>",
5
  "unk_token": "<unk>"
6
+ }
hf/tokenizer_config.json CHANGED
@@ -3,8 +3,8 @@
3
  "add_eos_token": false,
4
  "bos_token": "<s>",
5
  "eos_token": "</s>",
6
- "model_max_length": 256,
7
  "pad_token": "</s>",
8
  "tokenizer_class": "LlamaTokenizer",
9
  "unk_token": "<unk>"
10
- }
 
3
  "add_eos_token": false,
4
  "bos_token": "<s>",
5
  "eos_token": "</s>",
6
+ "model_max_length": 4096,
7
  "pad_token": "</s>",
8
  "tokenizer_class": "LlamaTokenizer",
9
  "unk_token": "<unk>"
10
+ }
tinymoe2m.BF16.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8b18ff85f62de21fc0c2eb68bb31378d3663d5f662eda1f1989349a06b322144
3
- size 3922752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0dea867afb1e81cf5d69de466d174bf7b87f3ff5f9bbe1d3bcda5fc2c9860df7
3
+ size 3922816
tinymoe2m.F16.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b329df06e638d7c996df9f287d1f7d9903327623efc531c6a78e6486073188f1
3
- size 3922752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17fe96d98c1c7d047ca86efdd27a5d14543d4006d1a4b270b40e914493e66e58
3
+ size 3922816
tinymoe2m.F32.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4f2ccb48ce78dde41d14dca96712c3df606ba75a44b05e70e3b032ff425ef49a
3
- size 7822144
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d6efb7dad656be7104e6d56f5702172b8b3e760b540769c12d9ecb4b6d1579b
3
+ size 7822208
tinymoe2m.Q2_K.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c11b1b881ead0c2f7fdaf0a191ce4cc3d0159f2c809624408a2ac9b7a94c2ed
3
- size 1152832
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfbaf6d788bc1a13f0a3c42709f3d89f73d4070e4bd5705e1c142a68d2679fa7
3
+ size 1152896
tinymoe2m.Q3_K_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0b431a6e950a4db6218955d1872f3aaf9abf23bd8c0466288328f8b9f23dfd6a
3
- size 1234752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ab5f7b73ecbab68ebd80bbf3c79e6c582f11cf33ca9248d8edfb41687474c15
3
+ size 1234816
tinymoe2m.Q4_0.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:89803d18cdbafe4432fbdc136b1adacedea5708e8f6381b2bcd9c026b08d8c4f
3
- size 1152832
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2311b70ce126f56c623376d7f4e53cbcefa3cb41afd4b10aebaa632584d71cd
3
+ size 1152896
tinymoe2m.Q4_1.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f7881d326d3649e087ec56b4de9505c57c2be085e6ac0270027d915770f8bde1
3
- size 1270592
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01869fef8fe7975ce5ca38c5d235c39f62eb3865804726d45af7b4d6f564a5a7
3
+ size 1270656
tinymoe2m.Q4_K_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:eb6c40f54d12edf5b6ec546d2023dff3d232a09d81e5c1540acc5467d8f9f608
3
- size 1462080
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c40aed9699f64b3ef28b8e67fec312ba17d92c7703c9f7ce89fe89da6f096070
3
+ size 1462144
tinymoe2m.Q5_K_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:45ac88cee0b4cf9ad26859f480457d4e30cf06eae337ea75d7f3ef0eecec7301
3
- size 1567552
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:709ce139eb9c0ab40157cfe037729a135824093ae6657caa7d1c87d8975971f3
3
+ size 1567616
tinymoe2m.Q6_K.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d195dda47ae604ffd13e9c88112e90b5273951e65b256e482a277dcf89a25bcb
3
- size 2094912
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b0699b54160586e2d2e552a8cf42c1a8d71da488b769c71ba5d4fce95443be4
3
+ size 2094976
tinymoe2m.Q8_0.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9eaeba40359bb03252a478076217e93fd55ef0ca7460ae179b135fcd4474b307
3
- size 2094912
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c965ca87f25bac60681f5523d86d256d6c52e5b591b8764676133df4e90e055
3
+ size 2094976