Code2LoRA-GRU — streaming hypernetwork

Final checkpoint of the streaming Code2LoRA-GRU used in the paper. A 1-layer GRU rolls the recurrence over per-commit diff embeddings and emits a rank-16 LoRA adapter for Qwen/Qwen2.5-Coder-1.5B at O(1) per commit.

Files

File	Description
`code2lora_gru.pt`	Trained GRU + `Code2LoRAHead` weights (~2.85 GB, fp32).
`metrics.jsonl`	Per-step training metrics (loss, val EM/EditSim/CodeBLEU).

Training recipe

3 epochs of truncated BPTT (window K=16) on code2lora/code2lora-data-smartcap (train QnAs) plus code2lora/code2lora-data-commits (commit metadata + diff embeddings).
AdamW + cosine schedule, max-seq-len 8192, bf16, single H100 80 GB.

Companion model

code2lora/code2lora-direct -- the static-snapshot variant.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support