SAGE-OSS-40B
SAGE-OSS-40B is an open-source research release from SAGEA β a 40B Mixture-of-Experts language model built on the LoopCoder architecture, SAGEA's early research into iterative loop-based reasoning. This model is part of the research lineage that informed the development of the SAGE Actus family.
This is not a production model. It is released for the research community to explore loop reasoning architectures and MoE scaling in open settings.
Model Details
| Property | Value |
|---|---|
| Architecture | SAGELoopCoder (MoE) |
| Parameters | ~40B |
| Tensor Type | BF16 |
| Context Length | 131,072 tokens |
| Vocab Size | 76,800 |
| Hidden Size | 5,120 |
| Layers | 80 |
| Attention Heads | 40 (GQA: 8 KV heads) |
| Loop Iterations | 2 |
| Loop Window Size | 64 |
| RoPE Theta | 500,000 |
| License | Apache 2.0 |
Architecture: LoopCoder
The LoopCoder architecture introduces iterative reasoning loops at the model
level. Rather than a single linear forward pass, the model performs loop_num
iterative passes over a sliding window of loop_window_size tokens, allowing
it to refine representations before producing output.
This was SAGEA's earlier approach to building reasoning capability directly into the model architecture β distinct from chain-of-thought prompting or post-training reasoning techniques.
Key architectural properties:
- loop_num: 2 β two iterative reasoning passes
- loop_window_size: 64 β token window over which looping occurs
- GQA β 40 attention heads, 8 KV heads for efficiency
- SiLU activations, RMS norm, no attention or MLP bias
- RoPE with theta 500,000 for long-context stability
Usage
This model uses a custom architecture. You need to load it with trust_remote_code=True.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "sagea-ai/sage-oss-40b"
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = "Explain the concept of recursion in programming."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
eos_token_id=[2, 75864, 75869]
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Note: Due to the custom
SAGELoopCoderForCausalLMarchitecture, standard pipeline inference may not work out of the box. Use the snippet above directly.
Limitations
- Research release β not instruction-tuned or RLHF aligned
- Not recommended for production use
- Evaluated internally; no public benchmark results at release
- Requires
trust_remote_code=Truedue to custom architecture
Relation to SAGEA Model Families
SAGE-OSS-40B sits outside the named SAGEA product families (VORA, Celer, Actus). It represents an earlier experimental direction and is released as-is for transparency and community research.
Current SAGEA model families:
- SAGE Celer β (low/mid/high) general-purpose models
- SAGE Actus β agentic and domain-specialized models
Citation
@misc{sagea2025sageoss,
title={SAGE-OSS-40B: Open-Source LoopCoder Reasoning Research Model},
author={SAGEA},
year={2025},
url={https://huggingface.co/sagea-ai/sage-oss-40b}
}
About SAGEA
SAGEA is an AI research company based in Nepal, building foundation models and AI infrastructure for South Asia and beyond.
- Downloads last month
- 2
