code2lora-gru / README.md
code2lora's picture
Update dataset/model card
26460cb verified
---
license: mit
tags: [code, lora, hypernetwork, peft, recurrent]
---
# Code2LoRA-GRU — streaming hypernetwork
Final checkpoint of the **streaming Code2LoRA-GRU** used in the paper. A
1-layer GRU rolls the recurrence over per-commit diff embeddings and emits
a rank-16 LoRA adapter for `Qwen/Qwen2.5-Coder-1.5B` at *O(1)* per commit.
## Files
| File | Description |
|---|---|
| `code2lora_gru.pt` | Trained GRU + `Code2LoRAHead` weights (~2.85 GB, fp32). |
| `metrics.jsonl` | Per-step training metrics (loss, val EM/EditSim/CodeBLEU). |
## Training recipe
* 3 epochs of truncated BPTT (window K=16) on
`code2lora/code2lora-data-smartcap` (train QnAs) plus
`code2lora/code2lora-data-commits` (commit metadata + diff embeddings).
* AdamW + cosine schedule, max-seq-len 8192, bf16, single H100 80 GB.
## Companion model
`code2lora/code2lora-direct` -- the static-snapshot variant.