this

SAGE-OSS-40B

SAGE-OSS-40B is an open-source research release from SAGEA β€” a 40B Mixture-of-Experts language model built on the LoopCoder architecture, SAGEA's early research into iterative loop-based reasoning. This model is part of the research lineage that informed the development of the SAGE Actus family.

This is not a production model. It is released for the research community to explore loop reasoning architectures and MoE scaling in open settings.


Model Details

Property Value
Architecture SAGELoopCoder (MoE)
Parameters ~40B
Tensor Type BF16
Context Length 131,072 tokens
Vocab Size 76,800
Hidden Size 5,120
Layers 80
Attention Heads 40 (GQA: 8 KV heads)
Loop Iterations 2
Loop Window Size 64
RoPE Theta 500,000
License Apache 2.0

Architecture: LoopCoder

The LoopCoder architecture introduces iterative reasoning loops at the model level. Rather than a single linear forward pass, the model performs loop_num iterative passes over a sliding window of loop_window_size tokens, allowing it to refine representations before producing output.

This was SAGEA's earlier approach to building reasoning capability directly into the model architecture β€” distinct from chain-of-thought prompting or post-training reasoning techniques.

Key architectural properties:

  • loop_num: 2 β€” two iterative reasoning passes
  • loop_window_size: 64 β€” token window over which looping occurs
  • GQA β€” 40 attention heads, 8 KV heads for efficiency
  • SiLU activations, RMS norm, no attention or MLP bias
  • RoPE with theta 500,000 for long-context stability

Usage

This model uses a custom architecture. You need to load it with trust_remote_code=True.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "sagea-ai/sage-oss-40b"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Explain the concept of recursion in programming."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        eos_token_id=[2, 75864, 75869]
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: Due to the custom SAGELoopCoderForCausalLM architecture, standard pipeline inference may not work out of the box. Use the snippet above directly.


Limitations

  • Research release β€” not instruction-tuned or RLHF aligned
  • Not recommended for production use
  • Evaluated internally; no public benchmark results at release
  • Requires trust_remote_code=True due to custom architecture

Relation to SAGEA Model Families

SAGE-OSS-40B sits outside the named SAGEA product families (VORA, Celer, Actus). It represents an earlier experimental direction and is released as-is for transparency and community research.

Current SAGEA model families:

  • SAGE Celer β€” (low/mid/high) general-purpose models
  • SAGE Actus β€” agentic and domain-specialized models

Citation

@misc{sagea2025sageoss,
  title={SAGE-OSS-40B: Open-Source LoopCoder Reasoning Research Model},
  author={SAGEA},
  year={2025},
  url={https://huggingface.co/sagea-ai/sage-oss-40b}
}

About SAGEA

SAGEA is an AI research company based in Nepal, building foundation models and AI infrastructure for South Asia and beyond.

Downloads last month
2
Safetensors
Model size
40B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support