Instructions to use shibatch/tinymoe2m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use shibatch/tinymoe2m with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("shibatch/tinymoe2m", device_map="auto")

llama-cpp-python

How to use shibatch/tinymoe2m with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="shibatch/tinymoe2m",
	filename="tinymoe2m.BF16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use shibatch/tinymoe2m with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf shibatch/tinymoe2m:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf shibatch/tinymoe2m:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf shibatch/tinymoe2m:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf shibatch/tinymoe2m:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf shibatch/tinymoe2m:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf shibatch/tinymoe2m:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf shibatch/tinymoe2m:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf shibatch/tinymoe2m:Q4_K_M

Use Docker

docker model run hf.co/shibatch/tinymoe2m:Q4_K_M

LM Studio
Jan
Ollama
How to use shibatch/tinymoe2m with Ollama:
```
ollama run hf.co/shibatch/tinymoe2m:Q4_K_M
```

Unsloth Studio

How to use shibatch/tinymoe2m with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shibatch/tinymoe2m to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shibatch/tinymoe2m to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for shibatch/tinymoe2m to start chatting

Atomic Chat new
Docker Model Runner
How to use shibatch/tinymoe2m with Docker Model Runner:
```
docker model run hf.co/shibatch/tinymoe2m:Q4_K_M
```

Lemonade

How to use shibatch/tinymoe2m with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull shibatch/tinymoe2m:Q4_K_M

Run and chat with the model

lemonade run user.tinymoe2m-Q4_K_M

List all available models

lemonade list

shibatch commited on Jun 4

Commit

ad0e5ec

verified ·

1 Parent(s): 364c6be

Upload folder using huggingface_hub

Browse files

Files changed (16) hide show

README.md +24 -5
hf/config.json +2 -2
hf/model.safetensors +1 -1
hf/special_tokens_map.json +1 -1
hf/tokenizer_config.json +2 -2
tinymoe2m.BF16.gguf +2 -2
tinymoe2m.F16.gguf +2 -2
tinymoe2m.F32.gguf +2 -2
tinymoe2m.Q2_K.gguf +2 -2
tinymoe2m.Q3_K_M.gguf +2 -2
tinymoe2m.Q4_0.gguf +2 -2
tinymoe2m.Q4_1.gguf +2 -2
tinymoe2m.Q4_K_M.gguf +2 -2
tinymoe2m.Q5_K_M.gguf +2 -2
tinymoe2m.Q6_K.gguf +2 -2
tinymoe2m.Q8_0.gguf +2 -2

README.md CHANGED Viewed

@@ -11,11 +11,13 @@ tags:
 - test-suite
 ---
-# TinyStories Mixtral 2M Top-2 MoE (tinymoe2m) GGUF & HF Validation Suite
 This repository provides an ultra-lightweight Mixtral model variant (a Mixture-of-Experts architecture utilizing the Llama 2 compute topology) scaled down to a **1.95M total parameter footprint** and a **1.14M active parameter execution frame**. It is trained on the TinyStories dataset and optimized as a precise validation asset.
-It is designed specifically for debugging custom inference engines, and native tensor compilers against MoE-specific runtime features. These include Gating network weight allocation, token distribution/gathering (Scatter/Gather loops), and the weighted addition combining multiple independent expert outputs.
 ---
@@ -31,6 +33,8 @@ To help track feature coverage across the 1M/2M verification suite, the core str
 | **Total Experts** | 1 (Non-MoE) | 1 (Non-MoE) | 1 (Non-MoE) | **4 Experts** |
 | **Selected Experts** | - | - | - | **Top-2 Experts** |
 | **Expert FFN Dim (`intermediate_size`)** | 564 | 352 | 352 | **352** (Shared across all experts) |
 | **Total Parameters** | ~1.2M | ~1.0M | ~1.0M | **~1.95M (1.95M Total)** |
 | **Active Parameters** | ~1.2M | ~1.0M | ~1.0M | **~1.14M (1.14M Active)** |
 | **Primary Debug Target** | Core matrix mult & layout | `byte_fallback` decode | Gemma 2 advanced graph | **Dynamic Routing & Scatter/Gather** |
@@ -61,14 +65,28 @@ Binary files optimized for execution via `llama.cpp` or compatible lower-level i
 ### 2. Hugging Face Native Format (`./hf/`)
 Unquantized components formatted for direct instantiation inside the PyTorch `transformers` library ecosystem:
 * **`hf/model.safetensors`**: Raw unquantized matrix parameters containing all 4 expert sub-networks alongside the master router tensor.
-* **`hf/config.json`**: Architectural specifications built around `MixtralConfig` criteria (layer depth, head maps, absolute expert counts, and top-k selection targets).
 * **`hf/generation_config.json`**: Standard generation defaults.
 * **`hf/tokenizer.model`**: The custom 512-vocabulary size SentencePiece BPE master binary.
-* **`hf/tokenizer_config.json`**: Metadata linking `LlamaTokenizer` classes to guarantee correct handling of prefix spacing and manage automatic `<s>` (BOS) injection properly on the Hugging Face backend.
 * **`hf/special_tokens_map.json`**: Structural map linking token strings (`<s>`=1, `</s>`=2) back to internal index bounds.
 ---
 ## 🚀 Usage Examples
 ### A. Running GGUF via llama.cpp
@@ -128,7 +146,8 @@ print("Generated:", generated_text)
 * **Number of Hidden Layers (`num_hidden_layers`):** 3
 * **Number of Attention Heads (`num_heads` / `num_kv_heads`):** 2 / 2 *(MHA layout)*
 * **Individual Expert Internal Dimension (`intermediate_size`):** 352 *(SwiGLU structure)*
-* **Max Position Embeddings (`max_position_embeddings`):** 256
 ## 📜 License

 - test-suite
 ---
+# TinyStories Mixtral 2M Top-2 MoE (tinymoe2m) GGUF & HF Validation Suite (4k Context)
 This repository provides an ultra-lightweight Mixtral model variant (a Mixture-of-Experts architecture utilizing the Llama 2 compute topology) scaled down to a **1.95M total parameter footprint** and a **1.14M active parameter execution frame**. It is trained on the TinyStories dataset and optimized as a precise validation asset.
+Following extensive long-context scaling evaluations, this asset has been calibrated to a **4,096 token context window (4k)** with an adjusted **RoPE base frequency (`rope_theta`) of 15,000.0** to prevent numerical saturation under FP32 precision boundaries while maintaining sharp localized attention coordinates.
+It is designed specifically for debugging custom inference engines (such as `vulformer`), and native tensor compilers against MoE-specific runtime features. These include Gating network weight allocation, token distribution/gathering (Scatter/Gather loops), and the weighted addition combining multiple independent expert outputs.
 ---
 | **Total Experts** | 1 (Non-MoE) | 1 (Non-MoE) | 1 (Non-MoE) | **4 Experts** |
 | **Selected Experts** | - | - | - | **Top-2 Experts** |
 | **Expert FFN Dim (`intermediate_size`)** | 564 | 352 | 352 | **352** (Shared across all experts) |
+| **Max Position Embeddings** | - | - | - | **4,096** |
+| **RoPE Base (`rope_theta`)** | - | - | - | **15,000.0** |
 | **Total Parameters** | ~1.2M | ~1.0M | ~1.0M | **~1.95M (1.95M Total)** |
 | **Active Parameters** | ~1.2M | ~1.0M | ~1.0M | **~1.14M (1.14M Active)** |
 | **Primary Debug Target** | Core matrix mult & layout | `byte_fallback` decode | Gemma 2 advanced graph | **Dynamic Routing & Scatter/Gather** |
 ### 2. Hugging Face Native Format (`./hf/`)
 Unquantized components formatted for direct instantiation inside the PyTorch `transformers` library ecosystem:
 * **`hf/model.safetensors`**: Raw unquantized matrix parameters containing all 4 expert sub-networks alongside the master router tensor.
+* **`hf/config.json`**: Architectural specifications built around `MixtralConfig` criteria (layer depth, head maps, absolute expert counts, and top-k selection targets). Fully updated to enforce `max_position_embeddings: 4096` and `rope_theta: 15000.0`.
 * **`hf/generation_config.json`**: Standard generation defaults.
 * **`hf/tokenizer.model`**: The custom 512-vocabulary size SentencePiece BPE master binary.
+* **`hf/tokenizer_config.json`**: Metadata linking `LlamaTokenizer` classes to guarantee correct handling of prefix spacing and manage automatic `<s>` (BOS) injection properly on the Hugging Face backend. Configured with `model_max_length: 4096`.
 * **`hf/special_tokens_map.json`**: Structural map linking token strings (`<s>`=1, `</s>`=2) back to internal index bounds.
 ---
+## 🎯 Purpose & Design Philosophy (Verification Targets)
+This checkpoint is specifically engineered as a deterministic validation test asset for computing platforms and **is not designed for long-context semantic extraction tasks (such as Needle-in-a-Haystack password retrieval).**
+Due to the extreme capacity boundaries (~1.95M total parameters) and ultra-compact vocabulary layout (512 tokens), the internal network matrices allocate their expressiveness exclusively toward mastering English syntax and high-frequency phrases. It lacks the multi-layer, high-order dynamic copy induction circuits required to trace out-of-context injection strings or narrow characters across large windows.
+### Expected Token Output Behavior
+When processed with template phrases containing temporary password identifiers like:
+`"The magic password of the giant was key X. I remember that the magic password of the giant was"`
+The network will cleanly bypass copying the literal character `X` and instead continue generating standard learned unigram-biased blocks such as `"about to go home. Every day..."`. This is mathematically expected behavior. Validation is achieved strictly via **Bit-Exact Logit Verification** across runtime backends to confirm matching compute kernels, KV cache memory indices, causal attention layers, and precise RoPE phase calculation.
+---
 ## 🚀 Usage Examples
 ### A. Running GGUF via llama.cpp
 * **Number of Hidden Layers (`num_hidden_layers`):** 3
 * **Number of Attention Heads (`num_heads` / `num_kv_heads`):** 2 / 2 *(MHA layout)*
 * **Individual Expert Internal Dimension (`intermediate_size`):** 352 *(SwiGLU structure)*
+* **Max Position Embeddings (`max_position_embeddings`):** 4,096
+* **RoPE Base Frequency (`rope_theta`):** 15,000.0
 ## 📜 License

hf/config.json CHANGED Viewed

@@ -11,7 +11,7 @@
   "hidden_size": 128,
   "initializer_range": 0.02,
   "intermediate_size": 352,
-  "max_position_embeddings": 256,
   "model_type": "mixtral",
   "num_attention_heads": 2,
   "num_experts_per_tok": 2,
@@ -22,7 +22,7 @@
   "pad_token_id": 2,
   "rms_norm_eps": 1e-05,
   "rope_parameters": {
-    "rope_theta": 1000000.0,
     "rope_type": "default"
   },
   "router_aux_loss_coef": 0.001,

   "hidden_size": 128,
   "initializer_range": 0.02,
   "intermediate_size": 352,
+  "max_position_embeddings": 4096,
   "model_type": "mixtral",
   "num_attention_heads": 2,
   "num_experts_per_tok": 2,
   "pad_token_id": 2,
   "rms_norm_eps": 1e-05,
   "rope_parameters": {
+    "rope_theta": 15000.0,
     "rope_type": "default"
   },
   "router_aux_loss_coef": 0.001,

hf/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:639f23e32fe1d96d3c3608b707446efec64487051b806b761d09dc3950896134
 size 7815432

 version https://git-lfs.github.com/spec/v1
+oid sha256:c67371ef508c3f071a4dc598fe263aa3058c02af631bd5c2dfedb1dae2387ee3
 size 7815432

hf/special_tokens_map.json CHANGED Viewed

@@ -3,4 +3,4 @@
   "eos_token": "</s>",
   "pad_token": "</s>",
   "unk_token": "<unk>"
-}

   "eos_token": "</s>",
   "pad_token": "</s>",
   "unk_token": "<unk>"
+}

hf/tokenizer_config.json CHANGED Viewed

@@ -3,8 +3,8 @@
   "add_eos_token": false,
   "bos_token": "<s>",
   "eos_token": "</s>",
-  "model_max_length": 256,
   "pad_token": "</s>",
   "tokenizer_class": "LlamaTokenizer",
   "unk_token": "<unk>"
-}

   "add_eos_token": false,
   "bos_token": "<s>",
   "eos_token": "</s>",
+  "model_max_length": 4096,
   "pad_token": "</s>",
   "tokenizer_class": "LlamaTokenizer",
   "unk_token": "<unk>"
+}

tinymoe2m.BF16.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8b18ff85f62de21fc0c2eb68bb31378d3663d5f662eda1f1989349a06b322144
-size 3922752

 version https://git-lfs.github.com/spec/v1
+oid sha256:0dea867afb1e81cf5d69de466d174bf7b87f3ff5f9bbe1d3bcda5fc2c9860df7
+size 3922816

tinymoe2m.F16.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b329df06e638d7c996df9f287d1f7d9903327623efc531c6a78e6486073188f1
-size 3922752

 version https://git-lfs.github.com/spec/v1
+oid sha256:17fe96d98c1c7d047ca86efdd27a5d14543d4006d1a4b270b40e914493e66e58
+size 3922816

tinymoe2m.F32.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4f2ccb48ce78dde41d14dca96712c3df606ba75a44b05e70e3b032ff425ef49a
-size 7822144

 version https://git-lfs.github.com/spec/v1
+oid sha256:7d6efb7dad656be7104e6d56f5702172b8b3e760b540769c12d9ecb4b6d1579b
+size 7822208

tinymoe2m.Q2_K.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5c11b1b881ead0c2f7fdaf0a191ce4cc3d0159f2c809624408a2ac9b7a94c2ed
-size 1152832

 version https://git-lfs.github.com/spec/v1
+oid sha256:cfbaf6d788bc1a13f0a3c42709f3d89f73d4070e4bd5705e1c142a68d2679fa7
+size 1152896

tinymoe2m.Q3_K_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0b431a6e950a4db6218955d1872f3aaf9abf23bd8c0466288328f8b9f23dfd6a
-size 1234752

 version https://git-lfs.github.com/spec/v1
+oid sha256:2ab5f7b73ecbab68ebd80bbf3c79e6c582f11cf33ca9248d8edfb41687474c15
+size 1234816

tinymoe2m.Q4_0.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:89803d18cdbafe4432fbdc136b1adacedea5708e8f6381b2bcd9c026b08d8c4f
-size 1152832

 version https://git-lfs.github.com/spec/v1
+oid sha256:b2311b70ce126f56c623376d7f4e53cbcefa3cb41afd4b10aebaa632584d71cd
+size 1152896

tinymoe2m.Q4_1.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f7881d326d3649e087ec56b4de9505c57c2be085e6ac0270027d915770f8bde1
-size 1270592

 version https://git-lfs.github.com/spec/v1
+oid sha256:01869fef8fe7975ce5ca38c5d235c39f62eb3865804726d45af7b4d6f564a5a7
+size 1270656

tinymoe2m.Q4_K_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:eb6c40f54d12edf5b6ec546d2023dff3d232a09d81e5c1540acc5467d8f9f608
-size 1462080

 version https://git-lfs.github.com/spec/v1
+oid sha256:c40aed9699f64b3ef28b8e67fec312ba17d92c7703c9f7ce89fe89da6f096070
+size 1462144

tinymoe2m.Q5_K_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:45ac88cee0b4cf9ad26859f480457d4e30cf06eae337ea75d7f3ef0eecec7301
-size 1567552

 version https://git-lfs.github.com/spec/v1
+oid sha256:709ce139eb9c0ab40157cfe037729a135824093ae6657caa7d1c87d8975971f3
+size 1567616

tinymoe2m.Q6_K.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d195dda47ae604ffd13e9c88112e90b5273951e65b256e482a277dcf89a25bcb
-size 2094912

 version https://git-lfs.github.com/spec/v1
+oid sha256:8b0699b54160586e2d2e552a8cf42c1a8d71da488b769c71ba5d4fce95443be4
+size 2094976

tinymoe2m.Q8_0.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9eaeba40359bb03252a478076217e93fd55ef0ca7460ae179b135fcd4474b307
-size 2094912

 version https://git-lfs.github.com/spec/v1
+oid sha256:1c965ca87f25bac60681f5523d86d256d6c52e5b591b8764676133df4e90e055
+size 2094976