Quasar Foundation Model

Quasar Foundation Models

Quasar is a family of foundation models developed by SILX AI. This repository hosts the Stage 1 checkpoint of Quasar, which represents the first major release in the Quasar training stack. Stage 1 was trained using RoPE positional embeddings over 300 billion tokens, with a native context length of approximately 20,000 tokens.

This release is technical and experimental, focusing on the core architecture and mixture-of-experts configuration. It is a standalone model and does not include downstream agent integrations.

Model Overview

Model Name: Quasar 22B (Stage 1)
Organization: SILX AI

Architecture

Total Parameters: 22 Billion
Active Parameters: 2 Billion (MoE)
Total Layers (L): 32
- The first 6 layers are dense computational blocks for feature extraction.
- Remaining 26 layers follow a hybrid 4:2 attention schedule.
Hidden Dimension (d_model): 2048
Routed Experts (N): 64 per MoE layer
Routing Strategy (k): Top-6 experts per token
Shared Experts (N_shared): 2 persistent experts per token
Expert Dimension (d_expert): 1408

Training

Training Tokens: 300 Billion
Positional Encoding: RoPE (Rotary Positional Embeddings)
Native Context Length: ~20,000 tokens
Objective: Causal Language Modeling

Technical Notes

Stage 1 Quasar uses a Mixture-of-Experts (MoE) design to scale parameters efficiently while keeping inference cost manageable. The model combines dense layers for initial feature extraction with routed experts for specialized processing. Shared experts are included to maintain baseline knowledge across all token inputs.

RoPE embeddings allow the model to generalize across long contexts without positional biases. This configuration was chosen to explore scaling properties and model stability before experimenting with DroPE (dropped positional embeddings) in later stages.

Stage 2

The next stage of Quasar will introduce a base model with millions of token context length. For Stage 2, see Quasar-V1-Base.

Downloads last month: -; Downloads are not tracked for this model. How to track