Rust-focused scorecard
Scoring below emphasizes Rust-specific capabilities: ownership/borrowing reasoning, lifetime handling, idiomatic Rust structure, macros, enums/ASTs, and standard-library concurrency patterns.
It is intentionally not a general coding benchmark.
Rust specialization score (higher is better)
| Model | Score | Visual |
|---|---|---|
| OmniCoder-9B-Strand-Rust-v1-GGUF | 8.4 / 10 | ████████░░ |
| qwen/qwen3.5-9b | 8.1 / 10 | ████████░░ |
| omnicoder-9b | 7.2 / 10 | ███████░░░ |
Visual category view
| Capability | qwen/qwen3.5-9b | omnicoder-9b | OmniCoder-9B-Strand-Rust-v1-GGUF |
|---|---|---|---|
| Ownership / borrowing | 9.5 ██████████ | 8.0 ████████░░ | 8.5 █████████░ |
| Lifetimes | 9.5 ██████████ | 8.0 ████████░░ | 8.5 █████████░ |
| Idiomatic Rust | 8.5 █████████░ | 7.0 ███████░░░ | 8.5 █████████░ |
macro_rules! |
4.0 ████░░░░░░ | 2.0 ██░░░░░░░░ | 8.0 ████████░░ |
| Enums / AST | 8.0 ████████░░ | 8.5 █████████░ | 8.0 ████████░░ |
| Std concurrency | 9.0 █████████░ | 5.0 █████░░░░░ | 9.0 █████████░ |
| Trait bounds | 8.0 ████████░░ | 8.0 ████████░░ | 8.0 ████████░░ |
| Async Rust | 7.0 ███████░░░ | 5.5 ██████░░░░ | 5.5 ██████░░░░ |
| Error handling | 5.0 █████░░░░░ | 6.0 ██████░░░░ | 2.5 ██░░░░░░░░ |
| Compile exactness | 6.5 ██████░░░░ | 4.0 ████░░░░░░ | 4.5 █████░░░░░ |
Interpretation
OmniCoder-9B-Strand-Rust-v1-GGUF ranks highest on this Rust-specialization view because it is stronger on:
- idiomatic Rust shaping
macro_rules!- standard Rust concurrency patterns
- Rust-oriented task framing
qwen/qwen3.5-9b remains stronger on:
- borrow-checker precision
- lifetime exactness
- edge-case compile reliability
Compared with the general-purpose omnicoder-9b, the Rust-tuned variant is clearly more aligned with:
- Rust idioms
- macros
- ownership-oriented reasoning
- standard-library concurrency usage
Training
- Base model: OmniCoder-9B
- Fine-tuning method: QLoRA
- Training stage analyzed: up to 1.5k steps
- Observed step range: ~200 → 1600
- Learning rate: warmup followed by near-constant LR at ~`4.9e-5
to5.0e-5` - Training throughput: ~`0.07–0.08 steps/s`
- Observed token count by ~1600 steps: ~`8.18M` tokens
Optimization dynamics
- Training loss dropped sharply during the initial phase and then entered a stable plateau.
- Smoothed training loss stabilized roughly around ~0.53–0.58 after the early optimization phase.
- Gradient norm decreased quickly after startup and remained stable afterward, typically around ~0.38–0.48 in later checkpoints.
- No obvious signs of divergence or gradient explosion were observed in the logged run.
- Learning rate remained almost flat after warmup, indicating a short warmup + near-constant LR schedule.
Validation trend
Observed evaluation loss decreased consistently across checkpoints:
- ~200 steps:
~0.536 - ~400 steps:
~0.504 - ~600 steps:
~0.490 - ~800 steps:
~0.482 - ~1000 steps:
~0.474 - ~1200 steps:
~0.471 - ~1400 steps:
~0.469–0.470 - ~1600 steps:
~0.462–0.465
Training interpretation
- The largest quality gains happened in the early stage of training.
- From roughly ~600+ steps, training entered a diminishing-returns regime.
- Validation loss continued to improve through the latest observed checkpoints.
- No clear overfitting signal was visible up to the observed ~1.5k step range.
- The run appears to have remained stable, with continued but gradually smaller validation improvements in later checkpoints.
Summary of observed run
- Optimization stability: good
- Validation trend: consistently improving
- Late-stage behavior: diminishing returns
- Overfitting signal up to ~1.5k steps: not clearly observed
- Recommended interpretation: stable fine-tuning run with most gains captured early and incremental improvement thereafter
- Downloads last month
- 138