Project Uroboros — RoSTE Stage 2 (Fully Merged)

Fully merged model from Project Uroboros — a 2-stage continual-learning pipeline built on ByteDance/Ouro-2.6B-Thinking.

The LoRA adapter weights have been merged into the base model weights. No PEFT or adapter loading is required at inference.

Training pipeline

Stage	Dataset	Purpose
1	UltraChat-200k	General instruction following
2	NVARC augmented puzzles (3 000 samples)	ARC visual reasoning

Architecture details

Setting	Value
Base model	`ByteDance/Ouro-2.6B-Thinking`
Method	RoSTE (Rotation-enhanced Straight-Through Estimator)
Quantization during training	4-bit NF4 double-quant
LoRA r / alpha	16 / 32
Bits (w / a / kv)	4 / 4 / 4
Rotations used	R3 (Q/K head_dim), R4 (down_proj, block-Hadamard)
Max seq length	1024

How to load

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Shrikanth19/roste-stage2-merged",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("Shrikanth19/roste-stage2-merged")
model.eval()

Inference example

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Your question here"}],
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
    out = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=True,
        temperature=1.0,
        top_p=0.7,
        use_cache=False,
        pad_token_id=tokenizer.pad_token_id,
    )
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Downloads last month: 18

Safetensors

Model size

3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shrikanth19/roste-stage2-merged

Base model

ByteDance/Ouro-2.6B-Thinking

Quantized

(3)

this model