3.54 GB

Ctrl+K

1 contributor

KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0)

ef12344 verified about 1 month ago

.gitattributes

1.57 kB
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
added_tokens.json

80 Bytes
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
chat_template.jinja

327 Bytes
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
config.json

1.1 kB
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
generation_config.json

120 Bytes
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
merges.txt

1.67 MB
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
model.safetensors

3.53 GB
xet

KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
special_tokens_map.json

370 Bytes
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
tokenizer.json

11.4 MB
xet

KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
tokenizer_config.json

421 Bytes
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago
vocab.json

2.78 MB
KD-distilled 2-layer Qwen2-MoE from hyper-accel/ci-random-qwen2-moe-a3b against Qwen1.5-MoE-A2.7B teacher (alpaca-cleaned, 1500 steps, T=2.0) about 1 month ago