Update Airy research model card (0.8b)
Browse files
README.md
CHANGED
|
@@ -13,3 +13,23 @@
|
|
| 13 |
- These files are more experimental than the release bundle.
|
| 14 |
- Production-facing use should prefer the release bundle.
|
| 15 |
- If prompting in Vietnamese, write with full accents for best consistency.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
- These files are more experimental than the release bundle.
|
| 14 |
- Production-facing use should prefer the release bundle.
|
| 15 |
- If prompting in Vietnamese, write with full accents for best consistency.
|
| 16 |
+
|
| 17 |
+
## Evaluation Snapshot
|
| 18 |
+
|
| 19 |
+
Research GGUFs were continued from the existing results and merged with the latest rerun on the same curated 58-question bilingual benchmark.
|
| 20 |
+
|
| 21 |
+
| Quant | Think | No-Think | Avg | Status |
|
| 22 |
+
|---|---:|---:|---:|---|
|
| 23 |
+
| Q3_K_M | 74.1% | 72.4% | 73.2% | Best current research quant |
|
| 24 |
+
| IQ3_M | 60.3% | 60.3% | 60.3% | Heavy quality loss |
|
| 25 |
+
| IQ2_M | 20.7% | 19.0% | 19.8% | Below usable threshold |
|
| 26 |
+
| IQ2_XS | 5.2% | 3.4% | 4.3% | Triggered early-stop for lower bits |
|
| 27 |
+
|
| 28 |
+
## Research Guidance
|
| 29 |
+
|
| 30 |
+
- Public research recommendation: **Q3_K_M** only
|
| 31 |
+
- **IQ3_M** is still uploadable for experiments, but quality is clearly degraded
|
| 32 |
+
- The rerun auto-stopped below **IQ2_XS** because average pass rate fell under 50%, so lower-bit quants should be considered archival artifacts rather than viable deployments
|
| 33 |
+
- For any user-facing scenario, prefer the release bundle instead of this research branch
|
| 34 |
+
|
| 35 |
+
For cross-family ranking and release-vs-research comparison, see `results/COMPARISON.md` in the workspace.
|