AtoMixtral-58K-5x5-DigitMesh
A minimal 58K parameter Mixture-of-Experts (MoE) model for 5×5 digit mesh recognition, built on the MixtralForCausalLM architecture.
Model Description
AtoMixtral-58K-5x5-DigitMesh is an ultra-lightweight MoE causal language model for efficient digit recognition from 5×5 binary mesh patterns. With only 58K parameters and 2 experts, this "atom-sized" MoE model demonstrates effective pattern recognition with sparse expert activation.
Key Specifications
- Architecture: MixtralForCausalLM (Mixture-of-Experts)
- Parameters: ~58K
- Experts: 2 local experts, 1 active per token
- Input: 5×5 binary mesh (25 tokens)
- Output: Digit tokens (D0-D9)
- Vocabulary Size: 14 tokens
- Context Length: 32 tokens
- Hidden Size: 32, Layers: 2, Attention Heads: 4
Quick Start
Serving with vLLM
python -m vllm.entrypoints.openai.api_server \
--model junzzhu/atoMixtral-58K-5x5-DigitMesh \
--max-model-len 32
Test Example
curl http://localhost:8000/v1/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "junzzhu/atoMixtral-58K-5x5-DigitMesh",
"prompt": "1 1 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 <SEP>",
"max_tokens": 1,
"temperature": 0
}'
Expected output: D7
Input Format
25 space-separated binary values (0 or 1) representing a 5×5 grid, followed by <SEP>:
[5 values] [5 values] [5 values] [5 values] [5 values] <SEP>
Use Cases
- MoE architecture research at minimal scale
- Educational demonstrations of sparse expert models
- Resource-constrained digit recognition
- Pattern recognition proof-of-concepts
License
Apache-2.0
- Downloads last month
- 21