nhannt201 commited on
Commit
37bf90d
·
verified ·
1 Parent(s): f38a82e

Update Airy research model card (0.8b)

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -13,3 +13,23 @@
13
  - These files are more experimental than the release bundle.
14
  - Production-facing use should prefer the release bundle.
15
  - If prompting in Vietnamese, write with full accents for best consistency.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - These files are more experimental than the release bundle.
14
  - Production-facing use should prefer the release bundle.
15
  - If prompting in Vietnamese, write with full accents for best consistency.
16
+
17
+ ## Evaluation Snapshot
18
+
19
+ Research GGUFs were continued from the existing results and merged with the latest rerun on the same curated 58-question bilingual benchmark.
20
+
21
+ | Quant | Think | No-Think | Avg | Status |
22
+ |---|---:|---:|---:|---|
23
+ | Q3_K_M | 74.1% | 72.4% | 73.2% | Best current research quant |
24
+ | IQ3_M | 60.3% | 60.3% | 60.3% | Heavy quality loss |
25
+ | IQ2_M | 20.7% | 19.0% | 19.8% | Below usable threshold |
26
+ | IQ2_XS | 5.2% | 3.4% | 4.3% | Triggered early-stop for lower bits |
27
+
28
+ ## Research Guidance
29
+
30
+ - Public research recommendation: **Q3_K_M** only
31
+ - **IQ3_M** is still uploadable for experiments, but quality is clearly degraded
32
+ - The rerun auto-stopped below **IQ2_XS** because average pass rate fell under 50%, so lower-bit quants should be considered archival artifacts rather than viable deployments
33
+ - For any user-facing scenario, prefer the release bundle instead of this research branch
34
+
35
+ For cross-family ranking and release-vs-research comparison, see `results/COMPARISON.md` in the workspace.