ruv commited on
Commit
42aa769
·
verified ·
1 Parent(s): f886857

Add L4 GPU benchmark results (67.1 tok/s)

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -471,3 +471,18 @@ let scores = compressor.inner_product_batch_optimized(&query, &compressed)?;
471
  - **Speculative Decoding** — 2-3x generation speedup
472
 
473
  [RuVector GitHub](https://github.com/ruvnet/ruvector) | [ruvllm crate](https://crates.io/crates/ruvllm) | [@ruvector/ruvllm npm](https://www.npmjs.com/package/@ruvector/ruvllm)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
471
  - **Speculative Decoding** — 2-3x generation speedup
472
 
473
  [RuVector GitHub](https://github.com/ruvnet/ruvector) | [ruvllm crate](https://crates.io/crates/ruvllm) | [@ruvector/ruvllm npm](https://www.npmjs.com/package/@ruvector/ruvllm)
474
+
475
+
476
+ ---
477
+
478
+ ## Benchmarks (L4 GPU, 24GB VRAM)
479
+
480
+ | Metric | Result |
481
+ |--------|--------|
482
+ | **Inference Speed** | 67.1 tok/s |
483
+ | **Model Load Time** | 2.35s |
484
+ | **Parameters** | 0.5B |
485
+ | **TurboQuant KV (3-bit)** | 10.7x compression, <1% PPL loss |
486
+ | **TurboQuant KV (4-bit)** | 8x compression, <0.5% PPL loss |
487
+
488
+ *Benchmarked on Google Cloud L4 GPU via `ruvltra-calibration` Cloud Run Job (2026-03-28)*