Project Uroboros โ RoSTE Stage 2 Adapter
Stage 2 of 2 in the Uroboros continual-learning pipeline.
Training details
| Setting | Value |
|---|---|
| Base model | ByteDance/Ouro-2.6B-Thinking |
| Datasets | UltraChat-200k โ NVARC-augmented-puzzles |
| Quantization | 4-bit NF4 double-quant (BitsAndBytes) |
| LoRA r / alpha | 16 / 32 |
| Bits (w / a / kv) | 4 / 4 / 4 |
| Rotations used | R3 (Q/K head_dim), R4 (down_proj input, block-Hadamard) |
| Max seq length | 1024 |
| Transformers | 4.54.1 |
Files
| File | Description |
|---|---|
roste_adapter.pt |
Trained lora_A + lora_B tensors from apply_roste |
roste_config.json |
Hyperparameters needed to reconstruct the model |
tokenizer.* |
Tokenizer saved from base model |
How to load
See the project notebook for the full apply_roste implementation.
The weight-injection pattern is:
saved = torch.load("roste_adapter.pt", map_location="cpu")
current = dict(model.named_parameters())
with torch.no_grad():
for name, tensor in saved.items():
if name in current and current[name].shape == tensor.shape:
current[name].copy_(tensor.to(current[name].device, current[name].dtype))
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Shrikanth19/roste-stage2-adapter
Base model
ByteDance/Ouro-2.6B-Thinking