| | --- |
| | language: |
| | - en |
| | - fr |
| | - it |
| | - de |
| | - es |
| | license: apache-2.0 |
| | tags: |
| | - mixtral |
| | - moe |
| | - mixture-of-experts |
| | - merge |
| | - chimera |
| | - klyrone |
| | - instruct |
| | - text-generation |
| | base_model: |
| | - mistralai/Mixtral-8x7B-v0.1 |
| | - mistralai/Mixtral-8x7B-Instruct-v0.1 |
| | base_model_relation: merge |
| | model_type: mixtral |
| | pipeline_tag: text-generation |
| | inference: false |
| | --- |
| | |
| | # Chimera 47B |
| |
|
| | **Klyrone F.Z.E.** Β· March 2026 Β· Apache 2.0 |
| |
|
| | Chimera 47B is a 46.7B parameter Mixture-of-Experts language model built using Klyrone's MoE assembly framework. It is constructed from Mixtral-8x7B-v0.1 and Mixtral-8x7B-Instruct-v0.1 β combining the base model's knowledge with the instruct model's capabilities β without any additional training. With 8 experts and top-2 routing, only 12.9B parameters are active per token, enabling fast inference at 154 tokens/second on H200 hardware. |
| |
|
| | A technical paper detailing the methodology is forthcoming. |
| |
|
| | --- |
| |
|
| | ## Key Numbers |
| |
|
| | | | | |
| | |---|---| |
| | | Total Parameters | 46.7 B | |
| | | Active / Token | 12.9 B | |
| | | Architecture | MoE Β· 8 experts Β· top-2 routing | |
| | | Context Length | 32,768 tokens | |
| | | Generation Speed | 154 t/s Β· H200 | |
| | | Prompt Processing | 878 t/s Β· H200 | |
| | | Quantization | Q5_K_M Β· 5.69 BPW | |
| | | File Size | 30.95 GB GGUF | |
| | | License | Apache 2.0 | |
| |
|
| | --- |
| |
|
| | ## Capabilities |
| |
|
| | - β
Instruction following β multi-turn conversational coherence |
| | - β
Code generation β correct, edge-case-aware output |
| | - β
Creative writing β long-form prose and poetry |
| | - β
Factual reasoning β physics, mathematics, general knowledge |
| | - β
Consumer-grade deployment β fits accessible GPU budgets at Q5_K_M |
| |
|
| | > Formal benchmark results (MMLU, HellaSwag, ARC-Challenge, GSM8K) in progress. |
| |
|
| | --- |
| |
|
| | ## About the Approach |
| |
|
| | Klyrone's MoE assembly framework constructs high-performance models by composing expert sub-networks from compatible source models β without full retraining. The expert FFN weights (w1, w2, w3) from Mixtral-8x7B-Instruct are transplanted into the Mixtral-8x7B-v0.1 base, preserving routing coherence while inheriting the instruction-tuned capabilities of the donor model. |
| |
|
| | For enterprise licensing or research collaboration, contact **research@klyrone.com** |
| |
|
| | --- |
| |
|
| | ## Usage |
| |
|
| | ### llama.cpp |
| |
|
| | ```bash |
| | ./llama-server \ |
| | -m Chimera-47B-Q5_K_M.gguf \ |
| | -ngl 99 \ |
| | --ctx-size 32768 \ |
| | --port 8080 |
| | ``` |
| |
|
| | Or for direct CLI inference: |
| |
|
| | ```bash |
| | ./llama-cli \ |
| | -m Chimera-47B-Q5_K_M.gguf \ |
| | -p "You are a helpful assistant." \ |
| | --ctx-size 32768 \ |
| | -ngl 99 \ |
| | -n 512 |
| | ``` |
| |
|
| | ### llama-cpp-python |
| |
|
| | ```python |
| | from llama_cpp import Llama |
| | |
| | llm = Llama.from_pretrained( |
| | repo_id="klyrone/Chimera", |
| | filename="Chimera-47B-Q5_K_M.gguf", |
| | n_gpu_layers=99, |
| | n_ctx=4096, |
| | verbose=False |
| | ) |
| | |
| | output = llm( |
| | "You are a helpful assistant.\n\nExplain the difference between supervised and unsupervised learning.", |
| | max_tokens=512, |
| | stop=["</s>"] |
| | ) |
| | print(output["choices"][0]["text"]) |
| | ``` |
| |
|
| | ### Ollama |
| |
|
| | ```bash |
| | ollama run hf.co/klyrone/Chimera |
| | ``` |
| |
|
| | > **Note:** This model is distributed as a GGUF file. Native Transformers loading (`AutoModelForCausalLM`) is not supported directly β use llama.cpp, llama-cpp-python, or Ollama for inference. |
| |
|
| | --- |
| |
|
| | ## Hardware Requirements |
| |
|
| | | Quantization | VRAM Required | Recommended Hardware | |
| | |---|---|---| |
| | | Q5_K_M (this file) | ~34 GB | A40, A100, 2Γ 3090/4090 | |
| | | Q4_K_M | ~27 GB | 3090/4090, A6000 | |
| | | Q3_K_M | ~22 GB | 24 GB consumer GPU | |
| |
|
| | --- |
| |
|
| | ## Limitations |
| |
|
| | - Router fine-tuning not yet applied β a short gate re-alignment is expected to yield marginal quality gains |
| | - No independent safety evaluation conducted β not recommended for unsupervised public-facing deployment |
| | - Benchmark results pending publication |
| | - STEM-heavy benchmarks (abstract algebra, HS math) may underperform relative to general capability, as mathematical knowledge is distributed across attention layers rather than expert FFNs |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{chimera47b2026, |
| | title = {Chimera 47B}, |
| | author = {{Klyrone F.Z.E.}}, |
| | year = {2026}, |
| | howpublished = {\url{https://huggingface.co/klyrone/Chimera}} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | *Chimera 47B Β· Klyrone F.Z.E. Β· Apache 2.0 Β· A technical paper on the MoE assembly technique is forthcoming.* |