Text Generation
PEFT
Safetensors
English
code-generation
grpo
lora
qlora
spark
co-evolution
python
conversational
Instructions to use amarsaikhan/spark-code-A-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use amarsaikhan/spark-code-A-3b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-3B-Instruct") model = PeftModel.from_pretrained(base_model, "amarsaikhan/spark-code-A-3b") - Notebooks
- Google Colab
- Kaggle
Update model card
Browse files
README.md
CHANGED
|
@@ -80,15 +80,18 @@ print(tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
|
|
| 80 |
|
| 81 |
## Comparison to Other Conditions
|
| 82 |
|
| 83 |
-
All
|
| 84 |
|
| 85 |
-
| Condition | aux_loss_scale | kl_coeff | HumanEval pass@1
|
| 86 |
-
|---|---:|---:|---:|---:|---:|
|
| 87 |
-
|
|
| 88 |
-
|
|
| 89 |
-
| [C-reg (regularized
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
-
|
| 92 |
|
| 93 |
## Findings Summary
|
| 94 |
|
|
@@ -98,7 +101,7 @@ Condition A delivers the highest HumanEval pass@1 and the lowest reference-polic
|
|
| 98 |
|
| 99 |
## Related Artifacts
|
| 100 |
|
| 101 |
-
- Sibling adapters: [spark-code-C-light-3b](https://huggingface.co/amarsaikhan/spark-code-C-light-3b) · [spark-code-C-reg-3b](https://huggingface.co/amarsaikhan/spark-code-C-reg-3b)
|
| 102 |
- GitHub repository: https://github.com/amarsaikhanb/spark-code
|
| 103 |
- Full per-problem eval data (HumanEval and held-out MBPP JSONs per iteration) lives under `condition_A/eval/` in the repository
|
| 104 |
- Interactive demo Space: [SPACES_URL]
|
|
|
|
| 80 |
|
| 81 |
## Comparison to Other Conditions
|
| 82 |
|
| 83 |
+
All five adapters share the same base model and seed. The original three (A, C-light, C-reg) used a 200-problem MBPP pool over 3 iterations; the two full-pool adapters (A-v2, C-reg2) used the 311-problem pool over 6 iterations. Each adapter row reports its **published checkpoint** — for A-v2 the iteration-4 peak, for the others the final / last completed iteration — and the _Base_ row is the untrained model (iteration 0, identical across all conditions). Rows are sorted by HumanEval pass@1, so conditions above _Base_ beat the baseline and those below regress. Bold marks the best value in each metric column (for GRPO KL, lower = less policy drift).
|
| 84 |
|
| 85 |
+
| Condition | Pool / iters | aux_loss_scale | kl_coeff | HumanEval pass@1 | MBPP-held pass@5 | GRPO KL |
|
| 86 |
+
|---|---|---:|---:|---:|---:|---:|
|
| 87 |
+
| [A-v2 (exec-only, full)](https://huggingface.co/amarsaikhan/spark-code-A-3b-v2) | 311 / it 4 | 0.00 | 0.02 | **0.816** | 0.710 | 0.0023 |
|
| 88 |
+
| **A (exec-only)** — this card | 200 / it 3 | 0.00 | 0.01 | 0.805 | 0.690 | **0.0011** |
|
| 89 |
+
| [C-reg (regularized)](https://huggingface.co/amarsaikhan/spark-code-C-reg-3b) | 200 / it 3 | 0.03 | 0.02 | 0.800 | **0.720** | 0.0136 |
|
| 90 |
+
| _Base (untrained Qwen2.5-Coder-3B)_ | — / it 0 | — | — | 0.796 | 0.680 | — |
|
| 91 |
+
| [C-reg2 (regularized, full)](https://huggingface.co/amarsaikhan/spark-code-C-reg2-3b) | 311 / it 6 | 0.02 | 0.03 | 0.774 | 0.680 | 0.0957 |
|
| 92 |
+
| [C-light (naive)](https://huggingface.co/amarsaikhan/spark-code-C-light-3b) | 200 / it 3 | 0.10 | 0.01 | 0.773 | 0.680 | 0.0941 |
|
| 93 |
|
| 94 |
+
The exec-only conditions (A, A-v2) hold the lowest KL and the top HumanEval pass@1; A's full-pool rerun ([A-v2](https://huggingface.co/amarsaikhan/spark-code-A-3b-v2)) is the strongest in the study. The co-evolve runs either fail outright (C-light) or drift over a long schedule (C-reg2); the short regularized run (C-reg) keeps the best MBPP pass@5.
|
| 95 |
|
| 96 |
## Findings Summary
|
| 97 |
|
|
|
|
| 101 |
|
| 102 |
## Related Artifacts
|
| 103 |
|
| 104 |
+
- Sibling adapters: [spark-code-C-light-3b](https://huggingface.co/amarsaikhan/spark-code-C-light-3b) · [spark-code-C-reg-3b](https://huggingface.co/amarsaikhan/spark-code-C-reg-3b) · [spark-code-A-3b-v2](https://huggingface.co/amarsaikhan/spark-code-A-3b-v2) · [spark-code-C-reg2-3b](https://huggingface.co/amarsaikhan/spark-code-C-reg2-3b)
|
| 105 |
- GitHub repository: https://github.com/amarsaikhanb/spark-code
|
| 106 |
- Full per-problem eval data (HumanEval and held-out MBPP JSONs per iteration) lives under `condition_A/eval/` in the repository
|
| 107 |
- Interactive demo Space: [SPACES_URL]
|