code2lora
/

code2lora-gru

Model card Files Files and versions

code2lora-gru / README.md

code2lora's picture

Update dataset/model card

26460cb verified 2 days ago

|

history blame contribute delete

899 Bytes

	---
	license: mit
	tags: [code, lora, hypernetwork, peft, recurrent]
	---

	# Code2LoRA-GRU — streaming hypernetwork

	Final checkpoint of the streaming Code2LoRA-GRU used in the paper. A
	1-layer GRU rolls the recurrence over per-commit diff embeddings and emits
	a rank-16 LoRA adapter for `Qwen/Qwen2.5-Coder-1.5B` at O(1) per commit.

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `code2lora_gru.pt` \| Trained GRU + `Code2LoRAHead` weights (~2.85 GB, fp32). \|
	\| `metrics.jsonl` \| Per-step training metrics (loss, val EM/EditSim/CodeBLEU). \|

	## Training recipe

	* 3 epochs of truncated BPTT (window K=16) on
	`code2lora/code2lora-data-smartcap` (train QnAs) plus
	`code2lora/code2lora-data-commits` (commit metadata + diff embeddings).
	* AdamW + cosine schedule, max-seq-len 8192, bf16, single H100 80 GB.

	## Companion model

	`code2lora/code2lora-direct` -- the static-snapshot variant.