Zenith V1
Collection
All V1 models of Zenith series • 4 items • Updated • 1
How to use Matrix-Corp/Zenith-32b-p300-V1 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Matrix-Corp/Zenith-32b-p300-V1") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Matrix-Corp/Zenith-32b-p300-V1", dtype="auto")How to use Matrix-Corp/Zenith-32b-p300-V1 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Matrix-Corp/Zenith-32b-p300-V1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Matrix-Corp/Zenith-32b-p300-V1",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/Matrix-Corp/Zenith-32b-p300-V1
How to use Matrix-Corp/Zenith-32b-p300-V1 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Matrix-Corp/Zenith-32b-p300-V1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Matrix-Corp/Zenith-32b-p300-V1",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Matrix-Corp/Zenith-32b-p300-V1" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Matrix-Corp/Zenith-32b-p300-V1",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use Matrix-Corp/Zenith-32b-p300-V1 with Docker Model Runner:
docker model run hf.co/Matrix-Corp/Zenith-32b-p300-V1
Tenstorrent p300a-optimized 32B parameter model based on DeepSeek-R1-Distill-Qwen-32B.
cd Zenith/V1-Tenstorrent-Blackhole-p300/32B
pip install -r requirements.txt
# LoRA fine-tuning (recommended)
python train.py \
--base_model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--train_data ./data/train.json \
--use_lora \
--lora_r 16 \
--lora_alpha 32 \
--epochs 3 \
--batch_size 4 \
--gradient_accumulation_steps 8 \
--learning_rate 1e-4 \
--use_ring_attention \
--max_seq_length 32768 \
--tensor_parallel_size 8 \
--pipeline_parallel_size 4 \
--use_noc_optimization \
--mixed_precision bf16
# Interactive mode
python inference.py --checkpoint ./outputs/checkpoint-final
# Single prompt
python inference.py \
--checkpoint ./outputs/checkpoint-final \
--prompt "Write a Python function to implement quicksort" \
--max_new_tokens 1024
ollama create zenith-32b-p300 -f Modelfile
ollama run zenith-32b-p300 "Explain the difference between supervised and unsupervised learning"
from configs.zenith_config import get_32b_config
config = get_32b_config()
Key Parameters:
hidden_size: 4096num_layers: 40num_heads: 32num_experts: 8 (configurable)moe_top_k: 2max_seq_len: 32768use_ring_attention: Truering_attention_chunk_size: 8192ring_attention_overlap: 2048from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig
ot_config = OpenThoughtsConfig(
dataset_name="open-thoughts/OpenThoughts3-1.2M",
streaming=True,
max_seq_length=32768,
quality_filtering=True,
curriculum_learning=True,
tokenizer=tokenizer
)
processor = OpenThoughtsProcessor(ot_config)
Multi-dimensional scoring:
--use_moe --num_experts 8 --moe_top_k 2
--use_eq_adapter --eq_loss_weight 0.05
--use_ring_attention --ring_chunk_size 8192 --ring_overlap 2048
python test_model.py
Tests cover:
python -m evaluation.benchmark \
--model_path ./outputs/checkpoint-final \
--benchmarks humaneval mbpp gsm8k math truthfulqa
ollama create zenith-32b-p300 -f Modelfile
ollama run zenith-32b-p300 "Your prompt here"
python -m vllm.entrypoints.openai.api_server \
--model ./outputs/checkpoint-final \
--tensor-parallel-size 2 \
--max-model-len 32768 \
--port 8000
| Configuration | Memory | Speed | Quality |
|---|---|---|---|
| Full FT, 2K | ~58GB | 50-80 | Baseline |
| LoRA r=16, 2K | ~18GB | 80-120 | 98% |
| QLoRA r=8, 2K | ~10GB | 100-150 | 95% |
| Ring 32K | +20% | 30-50 | Enables long context |
@misc{zenith-32b-p300-2025,
title={Zenith-32B-p300: A Tenstorrent-Optimized Reasoning Model},
author={Zenith Project},
year={2025}
}
[Specify]
README.mdFINETUNE_GUIDE.mdconfigs/zenith_config.pyBase model
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B