Upload MoE router and config
Browse files- README.md +9 -0
- config.json +10 -0
- router.pt +3 -0
README.md
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# fast-code-moe
|
| 2 |
+
|
| 3 |
+
Mixture of Experts (MoE) with:
|
| 4 |
+
- Experts: ['mistralai/Mistral-7B-Instruct-v0.2', 'Qwen/Qwen2.5-7B-Instruct']
|
| 5 |
+
- Router: distilbert-base-uncased + MLP
|
| 6 |
+
- Top‑k: 1
|
| 7 |
+
- Quantization: 4‑bit (bitsandbytes)
|
| 8 |
+
|
| 9 |
+
Trained on a subset of FLAN/IMDb to route instructions to the most suitable expert.
|
config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"experts": [
|
| 3 |
+
"mistralai/Mistral-7B-Instruct-v0.2",
|
| 4 |
+
"Qwen/Qwen2.5-7B-Instruct"
|
| 5 |
+
],
|
| 6 |
+
"top_k": 1,
|
| 7 |
+
"router_encoder": "distilbert-base-uncased",
|
| 8 |
+
"max_new_tokens": 256,
|
| 9 |
+
"description": "Claude-style MoE with lazy-loaded 4-bit experts"
|
| 10 |
+
}
|
router.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d48864ec5198121793e02acc06d657a4fc48aba711a4d776662f80d70a643ed8
|
| 3 |
+
size 791891
|