Llama-3.2-3B-MoE-4Expert

A 4-expert Mixture of Experts (MoE) model built on unsloth/Llama-3.2-3B-Instruct, specializing in chat, code, creative writing, and mathematics.

Model Description

This model combines four specialized Llama-3.2-3B variants into a single MoE architecture that intelligently routes queries to the most appropriate expert:

  • General Chat & Explanations - Clear, structured responses for everyday queries
  • Code & Programming - Python development, algorithms, and optimization
  • Creative Writing - Stories, poetry, roleplay, and narrative content
  • Mathematics - Equations, proofs, calculus, and step-by-step solutions

Architecture

  • Base Model: unsloth/Llama-3.2-3B-Instruct
  • Gate Mode: cheap_embed (embedding-based routing)
  • Precision: bfloat16
  • Merge Method: mergekit-moe
  • Total Experts: 4

Expert Models

Expert Model Specialization
1 unsloth/Llama-3.2-3B-Instruct General chat, summaries, explanations
2 prithivMLmods/Codepy-Deepthink-3B Python, algorithms, debugging
3 DavidAU/Llama-3.2-3B-Instruct-heretic-ablitered-uncensored Creative writing, fiction, roleplay
4 prithivMLmods/Llama-3.2-3B-Math-Oct Mathematics, proofs, calculations

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Fu01978/Llama-3.2-3B-MoE-4Expert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Write a Python function to calculate fibonacci numbers"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Performance

The model has been tested across all four domains and shows appropriate expert routing:

  • General knowledge queries route to chat expert
  • Programming tasks route to code expert
  • Creative prompts route to creative expert
  • Mathematical problems route to math expert

Edge cases and ambiguous queries default to the general chat expert.

Limitations

  • 3B parameter size means less capable than larger models
  • Expert routing depends on prompt phrasing
  • May occasionally route to suboptimal expert for ambiguous queries
  • Inherits limitations from constituent models

Acknowledgments

Built with mergekit. Thanks to:

  • Meta AI for Llama 3.2
  • unsloth for the base instruct model
  • prithivMLmods for code and math variants
  • DavidAU for the creative writing variant
Downloads last month
-
Safetensors
Model size
10B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Fu01978/Llama-3.2-3B-MoE-4Expert