Instructions to use silx-ai/Quasar-V1-Base-Stage1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use silx-ai/Quasar-V1-Base-Stage1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="silx-ai/Quasar-V1-Base-Stage1")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("silx-ai/Quasar-V1-Base-Stage1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use silx-ai/Quasar-V1-Base-Stage1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "silx-ai/Quasar-V1-Base-Stage1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "silx-ai/Quasar-V1-Base-Stage1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/silx-ai/Quasar-V1-Base-Stage1
- SGLang
How to use silx-ai/Quasar-V1-Base-Stage1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "silx-ai/Quasar-V1-Base-Stage1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "silx-ai/Quasar-V1-Base-Stage1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "silx-ai/Quasar-V1-Base-Stage1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "silx-ai/Quasar-V1-Base-Stage1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use silx-ai/Quasar-V1-Base-Stage1 with Docker Model Runner:
docker model run hf.co/silx-ai/Quasar-V1-Base-Stage1
Quasar Foundation Models
Quasar is a family of foundation models developed by SILX AI. This repository hosts the Stage 1 checkpoint of Quasar, which represents the first major release in the Quasar training stack. Stage 1 was trained using RoPE positional embeddings over 300 billion tokens, with a native context length of approximately 20,000 tokens.
This release is technical and experimental, focusing on the core architecture and mixture-of-experts configuration. It is a standalone model and does not include downstream agent integrations.
Model Overview
Model Name: Quasar 22B (Stage 1)
Organization: SILX AI
Architecture
- Total Parameters: 22 Billion
- Active Parameters: 2 Billion (MoE)
- Total Layers (L): 32
- The first 6 layers are dense computational blocks for feature extraction.
- Remaining 26 layers follow a hybrid 4:2 attention schedule.
- Hidden Dimension (d_model): 2048
- Routed Experts (N): 64 per MoE layer
- Routing Strategy (k): Top-6 experts per token
- Shared Experts (N_shared): 2 persistent experts per token
- Expert Dimension (d_expert): 1408
Training
- Training Tokens: 300 Billion
- Positional Encoding: RoPE (Rotary Positional Embeddings)
- Native Context Length: ~20,000 tokens
- Objective: Causal Language Modeling
Technical Notes
Stage 1 Quasar uses a Mixture-of-Experts (MoE) design to scale parameters efficiently while keeping inference cost manageable. The model combines dense layers for initial feature extraction with routed experts for specialized processing. Shared experts are included to maintain baseline knowledge across all token inputs.
RoPE embeddings allow the model to generalize across long contexts without positional biases. This configuration was chosen to explore scaling properties and model stability before experimenting with DroPE (dropped positional embeddings) in later stages.
Stage 2
The next stage of Quasar will introduce a base model with millions of token context length. For Stage 2, see Quasar-V1-Base.