LumiVore-1.2B
LumiVore-1.2B is a Mixture-of-Experts (MoE) language model fine-tuned for agentic workflows and conversational AI. Trained entirely on consumer hardware (AMD RX 7600 XT 16GB), it demonstrates that capable language models can be developed without datacenter-scale resources.
Model Details
| Attribute | Value |
|---|---|
| Architecture | Mixture-of-Experts (DeepSeek-MoE style) |
| Base Model | Qwen2.5-0.5B-Instruct |
| Total Parameters | 1.36B |
| Active Parameters | ~610M per token (top-2 routing) |
| Experts | 8 (1 shared + 7 routed) |
| MoE Layers | 8 of 24 transformer layers |
| Context Length | 2048 tokens |
| Precision | bfloat16 |
Architecture
LumiVore-1.2B uses a Mixture-of-Experts architecture with:
- 8 experts total: 1 shared expert always active + 7 routed experts
- Top-2 routing: For each token, the router selects 2 experts (1 shared + 1 routed)
- Sparse activation: Only ~610M parameters are active per token despite 1.36B total
- Load balancing: Auxiliary losses ensure even expert utilization
This design provides the capacity of a larger model with the inference cost of a smaller one.
Training
Stage 1: Capability Building
- Dataset: TerminalTrajectories + OpenThoughts (~11,600 examples)
- Method: Full fine-tuning with LoRA on routing layers
- Duration: ~5.4 hours
- Goal: General agent capabilities, tool use, reasoning
Stage 2: Domain Adaptation
- Dataset: OpenClaw agent-specific data (~11,900 examples)
- Method: LoRA fine-tuning (rank=64, attention + routing)
- Duration: ~5 hours
- Goal: OpenClaw ecosystem specialization
Hardware
- GPU: AMD RX 7600 XT (16GB VRAM)
- Framework: PyTorch with ROCm
- Optimizer: 8-bit AdamW
- Total Training Time: ~10 hours
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "LumiVore/lumivore-1.2b"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
prompt = "You are a helpful AI assistant.
User: Hello!
Assistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Limitations
- Small base: Built on Qwen2.5-0.5B β foundational limitations apply
- Training scale: 23K examples vs. millions for production models
- Identity: May occasionally claim to be other models (GPT-4, Qwen, etc.)
- Verbosity: Can be verbose; use system prompts to guide conciseness
- No RLHF: No reinforcement learning from human feedback
Evaluation
This model prioritizes:
- β Agentic tool use β calling functions, following patterns
- β Structured outputs β JSON, markdown, code
- β Conversational flow β turn-taking, context tracking
- β οΈ Creative writing β not a primary training objective
- β Factual knowledge β limited by base model size
Resources
| Resource | Link |
|---|---|
| GitHub | https://github.com/dansan-claw/lumivore |
| Website | https://lumivore.ai |
| Discord | https://discord.gg/M7U8JCUukD |
| Datasets | See LumiVore organization |
Citation
@misc{lumivore-1.2b,
title={LumiVore-1.2B: A Mixture-of-Experts Model for Agentic AI},
author={van Eek, Daniel},
year={2026},
url={https://huggingface.co/LumiVore/lumivore-1.2b}
}
License
Apache 2.0 β use it, modify it, ship it in your products.
LumiVore AI explores the future of intelligent systems β building AI that is efficient, adaptable, and accessible.
- Downloads last month
- 267