embedme
/

lightonai-lateon-code-edge-f16

@@ -32,8 +32,8 @@ GGUF conversion of [lightonai/LateOn-Code-edge](https://huggingface.co/lightonai
 | Variant | Size | Quality |
 |---------|------|---------|
 | [f32](https://huggingface.co/embedme/lightonai-lateon-code-edge-f32) | 66 MB | Original precision (lossless) |
-| **f16** (this repo) | 34 MB | Original precision (lossless) |
-| [Q8_0](https://huggingface.co/embedme/lightonai-lateon-code-edge-Q8_0) | 19 MB | 90% top-1 agreement on hard queries, 3.5× smaller |
 ## Files
@@ -66,37 +66,41 @@ LIMIT 10;
-## Benchmark: Quantization Quality on jq Codebase
-*533 C functions from [jq](https://github.com/jqlang/jq), 50 hard semantic queries (natural language, no keyword overlap with code). [Full methodology](https://github.com/alexandernicholson/litembeddings/blob/main/docs/performance.md).*
-### Ranking Agreement (vs f32 gold standard)
-| Variant | Kendall τ | Spearman ρ | Top-1 Agree | R@1 | R@5 | R@10 |
-|---------|-----------|------------|-------------|-----|-----|------|
-| f32     | 1.0000    | 1.0000     | 100%        | 100%| 100%| 100% |
-| f16     | 0.9983    | 0.9998     | 100%        | 100%| 100%| 100% |
-| Q8_0    | 0.9269    | 0.9819     | 90%         | 90% | 94% | 97%  |
-### Ground Truth Retrieval Accuracy
-| Variant | MRR    | Hits@1 | Hits@5 | Hits@10 |
-|---------|--------|--------|--------|---------|
-| f32     | 0.3067 | 20%    | 42%    | 56%     |
-| f16     | 0.3066 | 20%    | 42%    | 56%     |
-| Q8_0    | 0.3058 | 20%    | 38%    | 56%     |
-> Low absolute scores reflect deliberately hard queries — relative comparison between variants is what matters.
-### Size & Latency (CPU, no AVX)
-| Variant | GGUF   | Index (533 items) | Avg Query (533 candidates) |
-|---------|--------|-------------------|---------------------------|
-| f32     | 66 MB  | 104.3 s           | 4,861 ms                  |
-| f16     | 34 MB  | 266.2 s           | 19,264 ms                 |
-| Q8_0    | 19 MB  | 93.1 s            | 3,633 ms                  |
-**Key finding**: f16 is lossless — 100% top-1 agreement with f32 even on hard semantic queries, at half the model size. Recommended for best quality/size trade-off.
 ## Conversion

 | Variant | Size | Quality |
 |---------|------|---------|
 | [f32](https://huggingface.co/embedme/lightonai-lateon-code-edge-f32) | 66 MB | Original precision (lossless) |
+| **f16** (this repo) | 34 MB | Lossless — 100% top-1 agreement, 240/300 weighted |
+| [Q8_0](https://huggingface.co/embedme/lightonai-lateon-code-edge-Q8_0) | 19 MB | 79% weighted score, 96-100% top-1 agreement, 3.5× smaller |
 ## Files
+## Quantization Quality Benchmark
+Tested across 3 codebases (jq/C, Rails/Ruby, FastAPI/Python) with 150 questions total (15 easy + 20 medium + 15 hard per codebase). Weighted scoring: easy×1, medium×2, hard×3 = 100 points per codebase, 300 total.
+### Aggregate Weighted Scores
+| Variant | Weighted Score | Percentage |
+|---------|---------------|------------|
+| f32     | 240 / 300     | **80.0%**  |
+| f16     | 240 / 300     | **80.0%**  |
+| Q8_0    | 237 / 300     | **79.0%**  |
+### Per-Corpus Scores
+| Corpus  | f32     | f16     | Q8_0    |
+|---------|---------|---------|---------|
+| jq (C)  | 66/100  | 66/100  | 63/100  |
+| Rails (Ruby) | 79/100 | 79/100 | 79/100 |
+| FastAPI (Python) | 95/100 | 95/100 | 95/100 |
+### Quantization Quality (Top-1 Agreement vs f32)
+| Corpus  | f16    | Q8_0   |
+|---------|--------|--------|
+| jq      | 100.0% | 96.0%  |
+| Rails   | 100.0% | 100.0% |
+| FastAPI | 100.0% | 98.0%  |
+### Key Findings
+- **f16 is lossless** — identical weighted score (240/300) and 100% top-1 agreement across all codebases
+- **Q8_0 loses only 1%** — 237/300 vs 240/300, drops only on hard queries in jq corpus
+- **Q8_0 is fastest** — 2.5s avg query vs 3.4s f32 vs 13.4s f16 (CPU without FP16 hardware)
+- Easy/medium questions show zero quality difference between all variants
 ## Conversion