Chimera / README.md
kk497055's picture
Update README.md
74545eb verified
---
language:
- en
- fr
- it
- de
- es
license: apache-2.0
tags:
- mixtral
- moe
- mixture-of-experts
- merge
- chimera
- klyrone
- instruct
- text-generation
base_model:
- mistralai/Mixtral-8x7B-v0.1
- mistralai/Mixtral-8x7B-Instruct-v0.1
base_model_relation: merge
model_type: mixtral
pipeline_tag: text-generation
inference: false
---
# Chimera 47B
**Klyrone F.Z.E.** Β· March 2026 Β· Apache 2.0
Chimera 47B is a 46.7B parameter Mixture-of-Experts language model built using Klyrone's MoE assembly framework. It is constructed from Mixtral-8x7B-v0.1 and Mixtral-8x7B-Instruct-v0.1 β€” combining the base model's knowledge with the instruct model's capabilities β€” without any additional training. With 8 experts and top-2 routing, only 12.9B parameters are active per token, enabling fast inference at 154 tokens/second on H200 hardware.
A technical paper detailing the methodology is forthcoming.
---
## Key Numbers
| | |
|---|---|
| Total Parameters | 46.7 B |
| Active / Token | 12.9 B |
| Architecture | MoE Β· 8 experts Β· top-2 routing |
| Context Length | 32,768 tokens |
| Generation Speed | 154 t/s Β· H200 |
| Prompt Processing | 878 t/s Β· H200 |
| Quantization | Q5_K_M Β· 5.69 BPW |
| File Size | 30.95 GB GGUF |
| License | Apache 2.0 |
---
## Capabilities
- βœ… Instruction following β€” multi-turn conversational coherence
- βœ… Code generation β€” correct, edge-case-aware output
- βœ… Creative writing β€” long-form prose and poetry
- βœ… Factual reasoning β€” physics, mathematics, general knowledge
- βœ… Consumer-grade deployment β€” fits accessible GPU budgets at Q5_K_M
> Formal benchmark results (MMLU, HellaSwag, ARC-Challenge, GSM8K) in progress.
---
## About the Approach
Klyrone's MoE assembly framework constructs high-performance models by composing expert sub-networks from compatible source models β€” without full retraining. The expert FFN weights (w1, w2, w3) from Mixtral-8x7B-Instruct are transplanted into the Mixtral-8x7B-v0.1 base, preserving routing coherence while inheriting the instruction-tuned capabilities of the donor model.
For enterprise licensing or research collaboration, contact **research@klyrone.com**
---
## Usage
### llama.cpp
```bash
./llama-server \
-m Chimera-47B-Q5_K_M.gguf \
-ngl 99 \
--ctx-size 32768 \
--port 8080
```
Or for direct CLI inference:
```bash
./llama-cli \
-m Chimera-47B-Q5_K_M.gguf \
-p "You are a helpful assistant." \
--ctx-size 32768 \
-ngl 99 \
-n 512
```
### llama-cpp-python
```python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="klyrone/Chimera",
filename="Chimera-47B-Q5_K_M.gguf",
n_gpu_layers=99,
n_ctx=4096,
verbose=False
)
output = llm(
"You are a helpful assistant.\n\nExplain the difference between supervised and unsupervised learning.",
max_tokens=512,
stop=["</s>"]
)
print(output["choices"][0]["text"])
```
### Ollama
```bash
ollama run hf.co/klyrone/Chimera
```
> **Note:** This model is distributed as a GGUF file. Native Transformers loading (`AutoModelForCausalLM`) is not supported directly β€” use llama.cpp, llama-cpp-python, or Ollama for inference.
---
## Hardware Requirements
| Quantization | VRAM Required | Recommended Hardware |
|---|---|---|
| Q5_K_M (this file) | ~34 GB | A40, A100, 2Γ— 3090/4090 |
| Q4_K_M | ~27 GB | 3090/4090, A6000 |
| Q3_K_M | ~22 GB | 24 GB consumer GPU |
---
## Limitations
- Router fine-tuning not yet applied β€” a short gate re-alignment is expected to yield marginal quality gains
- No independent safety evaluation conducted β€” not recommended for unsupervised public-facing deployment
- Benchmark results pending publication
- STEM-heavy benchmarks (abstract algebra, HS math) may underperform relative to general capability, as mathematical knowledge is distributed across attention layers rather than expert FFNs
---
## Citation
```bibtex
@misc{chimera47b2026,
title = {Chimera 47B},
author = {{Klyrone F.Z.E.}},
year = {2026},
howpublished = {\url{https://huggingface.co/klyrone/Chimera}}
}
```
---
*Chimera 47B Β· Klyrone F.Z.E. Β· Apache 2.0 Β· A technical paper on the MoE assembly technique is forthcoming.*