GGUF
conversational

To build IRIS 18B first we reap pruned ERNIE 21B by 20%, then trained on 3B of thinking traces. We attempted SFT but it was not pretty, may retry SFT/DPO at a later point but releasing like this for now.

These improvements over ERNIE-21B-REAP have been noted

Benchmark Pre-CPT Post-CPT Δ

ARC-Easy 79.6 83.9 +4.3

ARC-Challenge 50.6 60.4 +9.8

HellaSwag 70.5 78.9 +8.4

Winogrande 67.2 72.1 +4.9

Downloads last month
31
GGUF
Model size
18B params
Architecture
ernie4_5-moe
Hardware compatibility
Log In to add your hardware

2-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jerrimu/IRIS-18B-GGUFS

Quantized
(1)
this model

Datasets used to train jerrimu/IRIS-18B-GGUFS