Text Generation
Transformers
Safetensors
glm4_moe
prime-rl
Mixture of Experts
test-model
conversational
How to use from
SGLangUse Docker images
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "PrimeIntellect/glm4-moe-tiny" \
--host 0.0.0.0 \
--port 30000# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "PrimeIntellect/glm4-moe-tiny",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'Quick Links
glm4-moe-tiny
A small (~543M parameter) GLM-4 MoE model for testing only. It is generally compatible with vLLM and HuggingFace Transformers but is meant to be used with prime-rl.
Fine-tuned on PrimeIntellect/Reverse-Text-SFT to provide a non-trivial distribution for KL divergence during RL.
Quick Start
uv run rl @ configs/ci/integration/rl_moe/glm4_moe.toml
See the Testing MoE at Small Scale guide for full instructions.
Model Details
| Parameter | Value |
|---|---|
| Hidden size | 1024 |
| Layers | 24 |
| Experts | 8 |
| Active experts | 4 |
| Parameters | ~543M |
Links
- prime-rl - RL training framework
- PrimeIntellect - Building infrastructure for decentralized AI
- Downloads last month
- 16
Install from pip and serve model
# Install SGLang from pip: pip install sglang# Start the SGLang server: python3 -m sglang.launch_server \ --model-path "PrimeIntellect/glm4-moe-tiny" \ --host 0.0.0.0 \ --port 30000# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "PrimeIntellect/glm4-moe-tiny", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'