Chimera / README.md

Update README.md

74545eb verified 9 days ago

4.2 kB

	---
	language:
	- en
	- fr
	- it
	- de
	- es
	license: apache-2.0
	tags:
	- mixtral
	- moe
	- mixture-of-experts
	- merge
	- chimera
	- klyrone
	- instruct
	- text-generation
	base_model:
	- mistralai/Mixtral-8x7B-v0.1
	- mistralai/Mixtral-8x7B-Instruct-v0.1
	base_model_relation: merge
	model_type: mixtral
	pipeline_tag: text-generation
	inference: false
	---

	# Chimera 47B

	Klyrone F.Z.E. · March 2026 · Apache 2.0

	Chimera 47B is a 46.7B parameter Mixture-of-Experts language model built using Klyrone's MoE assembly framework. It is constructed from Mixtral-8x7B-v0.1 and Mixtral-8x7B-Instruct-v0.1 — combining the base model's knowledge with the instruct model's capabilities — without any additional training. With 8 experts and top-2 routing, only 12.9B parameters are active per token, enabling fast inference at 154 tokens/second on H200 hardware.

	A technical paper detailing the methodology is forthcoming.

	---

	## Key Numbers

	\| \| \|
	\|---\|---\|
	\| Total Parameters \| 46.7 B \|
	\| Active / Token \| 12.9 B \|
	\| Architecture \| MoE · 8 experts · top-2 routing \|
	\| Context Length \| 32,768 tokens \|
	\| Generation Speed \| 154 t/s · H200 \|
	\| Prompt Processing \| 878 t/s · H200 \|
	\| Quantization \| Q5_K_M · 5.69 BPW \|
	\| File Size \| 30.95 GB GGUF \|
	\| License \| Apache 2.0 \|

	---

	## Capabilities

	- ✅ Instruction following — multi-turn conversational coherence
	- ✅ Code generation — correct, edge-case-aware output
	- ✅ Creative writing — long-form prose and poetry
	- ✅ Factual reasoning — physics, mathematics, general knowledge
	- ✅ Consumer-grade deployment — fits accessible GPU budgets at Q5_K_M

	> Formal benchmark results (MMLU, HellaSwag, ARC-Challenge, GSM8K) in progress.

	---

	## About the Approach

	Klyrone's MoE assembly framework constructs high-performance models by composing expert sub-networks from compatible source models — without full retraining. The expert FFN weights (w1, w2, w3) from Mixtral-8x7B-Instruct are transplanted into the Mixtral-8x7B-v0.1 base, preserving routing coherence while inheriting the instruction-tuned capabilities of the donor model.

	For enterprise licensing or research collaboration, contact research@klyrone.com

	---

	## Usage

	### llama.cpp

	```bash
	./llama-server \
	-m Chimera-47B-Q5_K_M.gguf \
	-ngl 99 \
	--ctx-size 32768 \
	--port 8080
	```

	Or for direct CLI inference:

	```bash
	./llama-cli \
	-m Chimera-47B-Q5_K_M.gguf \
	-p "You are a helpful assistant." \
	--ctx-size 32768 \
	-ngl 99 \
	-n 512
	```

	### llama-cpp-python

	```python
	from llama_cpp import Llama

	llm = Llama.from_pretrained(
	repo_id="klyrone/Chimera",
	filename="Chimera-47B-Q5_K_M.gguf",
	n_gpu_layers=99,
	n_ctx=4096,
	verbose=False
	)

	output = llm(
	"You are a helpful assistant.\n\nExplain the difference between supervised and unsupervised learning.",
	max_tokens=512,
	stop=["</s>"]
	)
	print(output["choices"][0]["text"])
	```

	### Ollama

	```bash
	ollama run hf.co/klyrone/Chimera
	```

	> Note: This model is distributed as a GGUF file. Native Transformers loading (`AutoModelForCausalLM`) is not supported directly — use llama.cpp, llama-cpp-python, or Ollama for inference.

	---

	## Hardware Requirements

	\| Quantization \| VRAM Required \| Recommended Hardware \|
	\|---\|---\|---\|
	\| Q5_K_M (this file) \| ~34 GB \| A40, A100, 2× 3090/4090 \|
	\| Q4_K_M \| ~27 GB \| 3090/4090, A6000 \|
	\| Q3_K_M \| ~22 GB \| 24 GB consumer GPU \|

	---

	## Limitations

	- Router fine-tuning not yet applied — a short gate re-alignment is expected to yield marginal quality gains
	- No independent safety evaluation conducted — not recommended for unsupervised public-facing deployment
	- Benchmark results pending publication
	- STEM-heavy benchmarks (abstract algebra, HS math) may underperform relative to general capability, as mathematical knowledge is distributed across attention layers rather than expert FFNs

	---

	## Citation

	```bibtex
	@misc{chimera47b2026,
	title = {Chimera 47B},
	author = {{Klyrone F.Z.E.}},
	year = {2026},
	howpublished = {\url{https://huggingface.co/klyrone/Chimera}}
	}
	```

	---

	Chimera 47B · Klyrone F.Z.E. · Apache 2.0 · A technical paper on the MoE assembly technique is forthcoming.