Project Uroboros โ RoSTE Stage 2 (Fully Merged)
Fully merged model from Project Uroboros โ a 2-stage continual-learning pipeline built on ByteDance/Ouro-2.6B-Thinking.
The LoRA adapter weights have been merged into the base model weights. No PEFT or adapter loading is required at inference.
Training pipeline
| Stage | Dataset | Purpose |
|---|---|---|
| 1 | UltraChat-200k | General instruction following |
| 2 | NVARC augmented puzzles (3 000 samples) | ARC visual reasoning |
Architecture details
| Setting | Value |
|---|---|
| Base model | ByteDance/Ouro-2.6B-Thinking |
| Method | RoSTE (Rotation-enhanced Straight-Through Estimator) |
| Quantization during training | 4-bit NF4 double-quant |
| LoRA r / alpha | 16 / 32 |
| Bits (w / a / kv) | 4 / 4 / 4 |
| Rotations used | R3 (Q/K head_dim), R4 (down_proj, block-Hadamard) |
| Max seq length | 1024 |
How to load
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"Shrikanth19/roste-stage2-merged",
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("Shrikanth19/roste-stage2-merged")
model.eval()
Inference example
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Your question here"}],
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
out = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=1.0,
top_p=0.7,
use_cache=False,
pad_token_id=tokenizer.pad_token_id,
)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
- Downloads last month
- 18
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Shrikanth19/roste-stage2-merged
Base model
ByteDance/Ouro-2.6B-Thinking