y0sif commited on
Commit
418e197
·
verified ·
1 Parent(s): 89f028a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +96 -13
README.md CHANGED
@@ -1,21 +1,104 @@
1
  ---
2
- base_model: unsloth/gemma-4-e4b-it-unsloth-bnb-4bit
 
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - gemma4
8
- license: apache-2.0
 
 
 
9
  language:
10
- - en
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** y0sif
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/gemma-4-e4b-it-unsloth-bnb-4bit
18
 
19
- This gemma4 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: gemma
3
+ base_model: unsloth/gemma-4-E4B-it
4
  tags:
5
+ - rust
6
+ - code-generation
7
+ - leptos
8
+ - axum
9
+ - rig
10
+ - gemma
11
+ - lora
12
+ - fine-tuned
13
  language:
14
+ - en
15
+ datasets:
16
+ - y0sif/Arcwright-v4-Combined
17
+ pipeline_tag: text-generation
18
  ---
19
 
20
+ # Arcwright
21
 
22
+ **Arcwright** is a Gemma 4 E4B-it model fine-tuned for modern Rust web and AI frameworks: [Leptos](https://leptos.dev), [Axum](https://github.com/tokio-rs/axum), and [Rig](https://github.com/0xPlaygrounds/rig).
 
 
23
 
24
+ On `RustWebBench-15`, Arcwright scores **6.87 / 10 overall** beating not only its base model (Gemma 4 E4B, 5.00) but also:
25
 
26
+ - Gemma 4 26B-A4B (6.59), the **6.5× larger** model from the same family
27
+ - Claude Haiku (6.73)
28
+ - Gemini (5.96)
29
+ - Qwen3-Coder 30B-A3B (5.75)
30
+
31
+ ## Leaderboard
32
+
33
+ | Rank | Model | Leptos | Axum | Rig | Overall |
34
+ |---|---|---|---|---|---|
35
+ | **1** | **Arcwright** | **8.40** | **8.28** | **3.92** | **6.87** |
36
+ | 2 | Claude Haiku | 7.16 | 8.04 | 5.00 | 6.73 |
37
+ | 3 | Gemma 4 26B-A4B | 7.72 | 8.20 | 3.84 | 6.59 |
38
+ | 4 | Gemini | 6.96 | 7.40 | 3.52 | 5.96 |
39
+ | 5 | Qwen3-Coder 30B-A3B | 7.36 | 6.20 | 3.68 | 5.75 |
40
+ | 6 | Gemma 4 E4B-it (base) | 5.24 | 6.84 | 2.92 | 5.00 |
41
+ | 7 | Qwen3 8B | 5.52 | 5.28 | 3.08 | 4.63 |
42
+ | 8 | Qwen2.5-Coder 7B | 4.28 | 4.68 | 1.64 | 3.53 |
43
+
44
+ All models evaluated on the same 15 prompts (5 per crate), judged on 5 dimensions (1-10): correctness, completeness, idiomatic, crate_knowledge, explanation.
45
+
46
+ ## Usage
47
+
48
+ ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
50
+
51
+ model = AutoModelForCausalLM.from_pretrained(
52
+ "y0sif/Arcwright", torch_dtype="auto", device_map="auto"
53
+ )
54
+ tokenizer = AutoTokenizer.from_pretrained("y0sif/Arcwright")
55
+
56
+ msgs = [{"role": "user", "content": [
57
+ {"type": "text", "text": "Write a Leptos counter component with increment/decrement buttons."}
58
+ ]}]
59
+ inputs = tokenizer.apply_chat_template(
60
+ msgs, tokenize=True, add_generation_prompt=True, return_tensors="pt"
61
+ ).to(model.device)
62
+ out = model.generate(inputs, max_new_tokens=1024)
63
+ print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
64
+ ```
65
+
66
+ A lightweight LoRA-only version is available at [`y0sif/Arcwright-LoRA`](https://huggingface.co/y0sif/Arcwright-LoRA) — apply on top of [`unsloth/gemma-4-E4B-it`](https://huggingface.co/unsloth/gemma-4-E4B-it).
67
+
68
+ ## Training details
69
+
70
+ - **Base**: `unsloth/gemma-4-E4B-it` (4-bit)
71
+ - **Method**: QLoRA, conservative settings (see below)
72
+ - **Dataset**: [`y0sif/Arcwright-v4-Combined`](https://huggingface.co/datasets/y0sif/Arcwright-v4-Combined) — 1,007 train / 109 test
73
+ - Leptos: 334 curated pairs
74
+ - Axum: 317 curated pairs
75
+ - Rig: 158 compile-verified pairs (115 from `examples/` + 43 compile-passing supplements)
76
+ - General Rust: 307 pairs from Strandset-Rust-v1 (replay buffer to prevent catastrophic forgetting)
77
+ - **Training data pipeline**: 3-gate quality pipeline — sub-agent generation → LLM judge (threshold 7.0) → `cargo check` compile verification. Only entries passing all three gates make it into training.
78
+ - **Hyperparameters**: r=8, alpha=16, dropout=0, lr=5e-5, 1 epoch, cosine schedule, bf16, effective batch 32. See [the hyperparameter rationale](https://github.com/y0sif/OxideCoder/blob/main/docs/v4-steps/05-train.md).
79
+ - **Hardware**: Colab Pro, L4 GPU
80
+ - **Training runtime**: ~15 minutes
81
+
82
+ ## Why it works
83
+
84
+ Three prior training runs (v1-v3) all regressed vs base. v4 fixed the root causes:
85
+
86
+ 1. **Compile-verified data**: every training entry compiles under `cargo check`. No hallucinated APIs.
87
+ 2. **Conservative hyperparameters**: low rank + low LR + single epoch — just enough drift to inject domain knowledge, not enough to overwrite base capabilities.
88
+ 3. **General-Rust replay buffer**: 28% of training mix is crate-agnostic Rust, preventing catastrophic forgetting.
89
+ 4. **Proportional per-crate sizing**: the ratio matches each crate's learnability vs the base model.
90
+
91
+ ## Limitations
92
+
93
+ - **Rig (AI agent framework) scores 3.92** — below Claude Haiku (5.00). Rig has the smallest training share (158 entries) and its API surface is the most niche. Model often uses the correct import paths but invents method signatures.
94
+ - Evaluated on n=5 per crate. Absolute scores have roughly ±0.3 judge variance.
95
+ - Knowledge cutoff reflects the crate versions in the training data (Axum 0.8, Leptos 0.7, Rig 0.13 era).
96
+ - Trained on full sequences (prompt + response), not completion-only — Unsloth/Gemma 4 VLM constraint.
97
+
98
+ ## Benchmark
99
+
100
+ Training data, eval prompts, judge rubric, and scripts: [y0sif/OxideCoder](https://github.com/y0sif/OxideCoder).
101
+
102
+ ## License
103
+
104
+ Inherits Gemma Terms of Use from the base model.