AtoMixtral-58K-5x5-DigitMesh

A minimal 58K parameter Mixture-of-Experts (MoE) model for 5×5 digit mesh recognition, built on the MixtralForCausalLM architecture.

Model Description

AtoMixtral-58K-5x5-DigitMesh is an ultra-lightweight MoE causal language model for efficient digit recognition from 5×5 binary mesh patterns. With only 58K parameters and 2 experts, this "atom-sized" MoE model demonstrates effective pattern recognition with sparse expert activation.

Key Specifications

  • Architecture: MixtralForCausalLM (Mixture-of-Experts)
  • Parameters: ~58K
  • Experts: 2 local experts, 1 active per token
  • Input: 5×5 binary mesh (25 tokens)
  • Output: Digit tokens (D0-D9)
  • Vocabulary Size: 14 tokens
  • Context Length: 32 tokens
  • Hidden Size: 32, Layers: 2, Attention Heads: 4

Quick Start

Serving with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model junzzhu/atoMixtral-58K-5x5-DigitMesh \
  --max-model-len 32

Test Example

curl http://localhost:8000/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "junzzhu/atoMixtral-58K-5x5-DigitMesh",
    "prompt": "1 1 1 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 <SEP>",
    "max_tokens": 1,
    "temperature": 0
  }'

Expected output: D7

Input Format

25 space-separated binary values (0 or 1) representing a 5×5 grid, followed by <SEP>:

[5 values] [5 values] [5 values] [5 values] [5 values] <SEP>

Use Cases

  • MoE architecture research at minimal scale
  • Educational demonstrations of sparse expert models
  • Resource-constrained digit recognition
  • Pattern recognition proof-of-concepts

License

Apache-2.0

Downloads last month
21
Safetensors
Model size
58.5k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support