--- language: - en - fr - it - de - es license: apache-2.0 tags: - mixtral - moe - mixture-of-experts - merge - chimera - klyrone - instruct - text-generation base_model: - mistralai/Mixtral-8x7B-v0.1 - mistralai/Mixtral-8x7B-Instruct-v0.1 base_model_relation: merge model_type: mixtral pipeline_tag: text-generation inference: false --- # Chimera 47B **Klyrone F.Z.E.** · March 2026 · Apache 2.0 Chimera 47B is a 46.7B parameter Mixture-of-Experts language model built using Klyrone's MoE assembly framework. It is constructed from Mixtral-8x7B-v0.1 and Mixtral-8x7B-Instruct-v0.1 — combining the base model's knowledge with the instruct model's capabilities — without any additional training. With 8 experts and top-2 routing, only 12.9B parameters are active per token, enabling fast inference at 154 tokens/second on H200 hardware. A technical paper detailing the methodology is forthcoming. --- ## Key Numbers | | | |---|---| | Total Parameters | 46.7 B | | Active / Token | 12.9 B | | Architecture | MoE · 8 experts · top-2 routing | | Context Length | 32,768 tokens | | Generation Speed | 154 t/s · H200 | | Prompt Processing | 878 t/s · H200 | | Quantization | Q5_K_M · 5.69 BPW | | File Size | 30.95 GB GGUF | | License | Apache 2.0 | --- ## Capabilities - ✅ Instruction following — multi-turn conversational coherence - ✅ Code generation — correct, edge-case-aware output - ✅ Creative writing — long-form prose and poetry - ✅ Factual reasoning — physics, mathematics, general knowledge - ✅ Consumer-grade deployment — fits accessible GPU budgets at Q5_K_M > Formal benchmark results (MMLU, HellaSwag, ARC-Challenge, GSM8K) in progress. --- ## About the Approach Klyrone's MoE assembly framework constructs high-performance models by composing expert sub-networks from compatible source models — without full retraining. The expert FFN weights (w1, w2, w3) from Mixtral-8x7B-Instruct are transplanted into the Mixtral-8x7B-v0.1 base, preserving routing coherence while inheriting the instruction-tuned capabilities of the donor model. For enterprise licensing or research collaboration, contact **research@klyrone.com** --- ## Usage ### llama.cpp ```bash ./llama-server \ -m Chimera-47B-Q5_K_M.gguf \ -ngl 99 \ --ctx-size 32768 \ --port 8080 ``` Or for direct CLI inference: ```bash ./llama-cli \ -m Chimera-47B-Q5_K_M.gguf \ -p "You are a helpful assistant." \ --ctx-size 32768 \ -ngl 99 \ -n 512 ``` ### llama-cpp-python ```python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="klyrone/Chimera", filename="Chimera-47B-Q5_K_M.gguf", n_gpu_layers=99, n_ctx=4096, verbose=False ) output = llm( "You are a helpful assistant.\n\nExplain the difference between supervised and unsupervised learning.", max_tokens=512, stop=[""] ) print(output["choices"][0]["text"]) ``` ### Ollama ```bash ollama run hf.co/klyrone/Chimera ``` > **Note:** This model is distributed as a GGUF file. Native Transformers loading (`AutoModelForCausalLM`) is not supported directly — use llama.cpp, llama-cpp-python, or Ollama for inference. --- ## Hardware Requirements | Quantization | VRAM Required | Recommended Hardware | |---|---|---| | Q5_K_M (this file) | ~34 GB | A40, A100, 2× 3090/4090 | | Q4_K_M | ~27 GB | 3090/4090, A6000 | | Q3_K_M | ~22 GB | 24 GB consumer GPU | --- ## Limitations - Router fine-tuning not yet applied — a short gate re-alignment is expected to yield marginal quality gains - No independent safety evaluation conducted — not recommended for unsupervised public-facing deployment - Benchmark results pending publication - STEM-heavy benchmarks (abstract algebra, HS math) may underperform relative to general capability, as mathematical knowledge is distributed across attention layers rather than expert FFNs --- ## Citation ```bibtex @misc{chimera47b2026, title = {Chimera 47B}, author = {{Klyrone F.Z.E.}}, year = {2026}, howpublished = {\url{https://huggingface.co/klyrone/Chimera}} } ``` --- *Chimera 47B · Klyrone F.Z.E. · Apache 2.0 · A technical paper on the MoE assembly technique is forthcoming.*