WallResearch
/

recurrent-staged-loras-model

Model card Files Files and versions

recurrent-staged-loras-model / README.md

jeffreywallphd's picture

Publish run artifacts

780d17a verified about 1 month ago

|

history blame contribute delete

1.06 kB

	# stage_specialized_recurrence1

	## Research artifact notice
	This upload is a research artifact from `recurrent-staged-loras`; validate behavior before any downstream usage.

	## Run metadata
	- Base model: `Qwen/Qwen3-8B`
	- Baseline family: `stage_specialized_recurrence`
	- Recurrence mode: `stage_specialized`
	- Adapter settings: `{"latent_refiner": {"adapter_sharing": "per_step", "enabled": true, "hidden_size": 0, "num_steps": 3, "recurrence_mode": "stage_specialized"}, "latent_refiner_adapter": {"alpha": 16, "dropout": 0.0, "enabled": true, "rank": 8, "target_modules": ["refiner_proj"]}, "standard_lora": {"alpha": 32, "dropout": 0.05, "enabled": false, "rank": 16, "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "up_proj", "down_proj", "gate_proj"]}}`
	- Dataset: `metamath_qa` split `train`
	- Training seed: `11`

	## Loading
	Primary weights are exported as Hugging Face-compatible safetensors (single-file or sharded with index).
	PyTorch checkpoint artifacts (`checkpoint.pt`) are removed after safetensors export+validation.