| # fast-code-moe | |
| Mixture of Experts (MoE) with: | |
| - Experts: ['mistralai/Mistral-7B-Instruct-v0.2', 'Qwen/Qwen2.5-7B-Instruct'] | |
| - Router: distilbert-base-uncased + MLP | |
| - Top‑k: 1 | |
| - Quantization: 4‑bit (bitsandbytes) | |
| Trained on a subset of FLAN/IMDb to route instructions to the most suitable expert. | |