Project Uroboros โ€” RoSTE Stage 2 (Fully Merged)

Fully merged model from Project Uroboros โ€” a 2-stage continual-learning pipeline built on ByteDance/Ouro-2.6B-Thinking.

The LoRA adapter weights have been merged into the base model weights. No PEFT or adapter loading is required at inference.

Training pipeline

Stage Dataset Purpose
1 UltraChat-200k General instruction following
2 NVARC augmented puzzles (3 000 samples) ARC visual reasoning

Architecture details

Setting Value
Base model ByteDance/Ouro-2.6B-Thinking
Method RoSTE (Rotation-enhanced Straight-Through Estimator)
Quantization during training 4-bit NF4 double-quant
LoRA r / alpha 16 / 32
Bits (w / a / kv) 4 / 4 / 4
Rotations used R3 (Q/K head_dim), R4 (down_proj, block-Hadamard)
Max seq length 1024

How to load

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Shrikanth19/roste-stage2-merged",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("Shrikanth19/roste-stage2-merged")
model.eval()

Inference example

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Your question here"}],
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
    out = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=True,
        temperature=1.0,
        top_p=0.7,
        use_cache=False,
        pad_token_id=tokenizer.pad_token_id,
    )
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Downloads last month
18
Safetensors
Model size
3B params
Tensor type
F32
ยท
U8
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Shrikanth19/roste-stage2-merged

Quantized
(3)
this model