ctrltokyo commited on
Commit
0192e25
·
verified ·
1 Parent(s): 21f5822

Update benchmark results: multi-codebase (jq, Rails, FastAPI) weighted scoring

Browse files
Files changed (1) hide show
  1. README.md +28 -24
README.md CHANGED
@@ -32,8 +32,8 @@ GGUF conversion of [lightonai/LateOn-Code-edge](https://huggingface.co/lightonai
32
  | Variant | Size | Quality |
33
  |---------|------|---------|
34
  | [f32](https://huggingface.co/embedme/lightonai-lateon-code-edge-f32) | 66 MB | Original precision (lossless) |
35
- | **f16** (this repo) | 34 MB | Original precision (lossless) |
36
- | [Q8_0](https://huggingface.co/embedme/lightonai-lateon-code-edge-Q8_0) | 19 MB | 90% top-1 agreement on hard queries, 3.5× smaller |
37
 
38
  ## Files
39
 
@@ -66,37 +66,41 @@ LIMIT 10;
66
 
67
 
68
 
69
- ## Benchmark: Quantization Quality on jq Codebase
70
 
71
- *533 C functions from [jq](https://github.com/jqlang/jq), 50 hard semantic queries (natural language, no keyword overlap with code). [Full methodology](https://github.com/alexandernicholson/litembeddings/blob/main/docs/performance.md).*
72
 
73
- ### Ranking Agreement (vs f32 gold standard)
74
 
75
- | Variant | Kendall τ | Spearman ρ | Top-1 Agree | R@1 | R@5 | R@10 |
76
- |---------|-----------|------------|-------------|-----|-----|------|
77
- | f32 | 1.0000 | 1.0000 | 100% | 100%| 100%| 100% |
78
- | f16 | 0.9983 | 0.9998 | 100% | 100%| 100%| 100% |
79
- | Q8_0 | 0.9269 | 0.9819 | 90% | 90% | 94% | 97% |
80
 
81
- ### Ground Truth Retrieval Accuracy
82
 
83
- | Variant | MRR | Hits@1 | Hits@5 | Hits@10 |
84
- |---------|--------|--------|--------|---------|
85
- | f32 | 0.3067 | 20% | 42% | 56% |
86
- | f16 | 0.3066 | 20% | 42% | 56% |
87
- | Q8_0 | 0.3058 | 20% | 38% | 56% |
88
 
89
- > Low absolute scores reflect deliberately hard queries — relative comparison between variants is what matters.
90
 
91
- ### Size & Latency (CPU, no AVX)
 
 
 
 
92
 
93
- | Variant | GGUF | Index (533 items) | Avg Query (533 candidates) |
94
- |---------|--------|-------------------|---------------------------|
95
- | f32 | 66 MB | 104.3 s | 4,861 ms |
96
- | f16 | 34 MB | 266.2 s | 19,264 ms |
97
- | Q8_0 | 19 MB | 93.1 s | 3,633 ms |
 
98
 
99
- **Key finding**: f16 is lossless — 100% top-1 agreement with f32 even on hard semantic queries, at half the model size. Recommended for best quality/size trade-off.
100
 
101
  ## Conversion
102
 
 
32
  | Variant | Size | Quality |
33
  |---------|------|---------|
34
  | [f32](https://huggingface.co/embedme/lightonai-lateon-code-edge-f32) | 66 MB | Original precision (lossless) |
35
+ | **f16** (this repo) | 34 MB | Lossless 100% top-1 agreement, 240/300 weighted |
36
+ | [Q8_0](https://huggingface.co/embedme/lightonai-lateon-code-edge-Q8_0) | 19 MB | 79% weighted score, 96-100% top-1 agreement, 3.5× smaller |
37
 
38
  ## Files
39
 
 
66
 
67
 
68
 
69
+ ## Quantization Quality Benchmark
70
 
71
+ Tested across 3 codebases (jq/C, Rails/Ruby, FastAPI/Python) with 150 questions total (15 easy + 20 medium + 15 hard per codebase). Weighted scoring: easy×1, medium×2, hard×3 = 100 points per codebase, 300 total.
72
 
73
+ ### Aggregate Weighted Scores
74
 
75
+ | Variant | Weighted Score | Percentage |
76
+ |---------|---------------|------------|
77
+ | f32 | 240 / 300 | **80.0%** |
78
+ | f16 | 240 / 300 | **80.0%** |
79
+ | Q8_0 | 237 / 300 | **79.0%** |
80
 
81
+ ### Per-Corpus Scores
82
 
83
+ | Corpus | f32 | f16 | Q8_0 |
84
+ |---------|---------|---------|---------|
85
+ | jq (C) | 66/100 | 66/100 | 63/100 |
86
+ | Rails (Ruby) | 79/100 | 79/100 | 79/100 |
87
+ | FastAPI (Python) | 95/100 | 95/100 | 95/100 |
88
 
89
+ ### Quantization Quality (Top-1 Agreement vs f32)
90
 
91
+ | Corpus | f16 | Q8_0 |
92
+ |---------|--------|--------|
93
+ | jq | 100.0% | 96.0% |
94
+ | Rails | 100.0% | 100.0% |
95
+ | FastAPI | 100.0% | 98.0% |
96
 
97
+ ### Key Findings
98
+
99
+ - **f16 is lossless** identical weighted score (240/300) and 100% top-1 agreement across all codebases
100
+ - **Q8_0 loses only 1%** 237/300 vs 240/300, drops only on hard queries in jq corpus
101
+ - **Q8_0 is fastest** — 2.5s avg query vs 3.4s f32 vs 13.4s f16 (CPU without FP16 hardware)
102
+ - Easy/medium questions show zero quality difference between all variants
103
 
 
104
 
105
  ## Conversion
106