Text Generation
Transformers
English
zenith
tenstorrent
reasoning
math
Mixture of Experts
ring-attention
eq-adapter
deepseek-r1
matrix-corp
Instructions to use Matrix-Corp/Zenith-32b-p300-V1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Matrix-Corp/Zenith-32b-p300-V1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Matrix-Corp/Zenith-32b-p300-V1")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Matrix-Corp/Zenith-32b-p300-V1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Matrix-Corp/Zenith-32b-p300-V1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Matrix-Corp/Zenith-32b-p300-V1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Matrix-Corp/Zenith-32b-p300-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Matrix-Corp/Zenith-32b-p300-V1
- SGLang
How to use Matrix-Corp/Zenith-32b-p300-V1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Matrix-Corp/Zenith-32b-p300-V1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Matrix-Corp/Zenith-32b-p300-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Matrix-Corp/Zenith-32b-p300-V1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Matrix-Corp/Zenith-32b-p300-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Matrix-Corp/Zenith-32b-p300-V1 with Docker Model Runner:
docker model run hf.co/Matrix-Corp/Zenith-32b-p300-V1
| # Zenith-32B-p300 Model Configuration for Ollama | |
| # Tenstorrent p300a Optimized - V1-Tenstorrent-Blackhole-p300 | |
| # Based on DeepSeek-R1-Distill-Qwen-32B | |
| FROM deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | |
| # System prompt emphasizing reasoning and code | |
| SYSTEM """ | |
| You are Zenith-32B-p300, a powerful reasoning and coding model optimized for Tenstorrent p300a hardware. | |
| Based on DeepSeek-R1-Distill-Qwen-32B with Zenith enhancements. | |
| Your capabilities: | |
| - Advanced reasoning and problem-solving | |
| - Complex code generation and analysis | |
| - Mathematical and logical thinking | |
| - Long-context processing (32K tokens) | |
| - Multi-language support | |
| - Emotional intelligence | |
| When solving problems: | |
| 1. Break down complex problems into steps | |
| 2. Show your reasoning process clearly | |
| 3. Consider edge cases and alternatives | |
| 4. Verify your solutions | |
| When coding: | |
| - Write clean, efficient, well-documented code | |
| - Follow best practices and conventions | |
| - Handle errors and edge cases | |
| - Optimize for performance where appropriate | |
| Always strive for accuracy, clarity, and helpfulness. | |
| """ | |
| # Generation parameters | |
| PARAMETER temperature 0.6 | |
| PARAMETER top_p 0.88 | |
| PARAMETER top_k 45 | |
| PARAMETER repeat_penalty 1.08 | |
| PARAMETER num_predict 8192 | |
| # 32K context window | |
| PARAMETER num_ctx 32768 | |
| # Chat template for Qwen format | |
| TEMPLATE """ | |
| {{- if .Messages }} | |
| {{- $role := .Messages | first | .Role }} | |
| {{- if or (eq $role "user") (eq $role "system") }} | |
| {{- range $i, $_ := .Messages }} | |
| {{- if eq .Role "user" }} | |
| {{- "\nUser: " }}{{ .Content }} | |
| {{- else if eq .Role "assistant" }} | |
| {{- "\nAssistant: " }}{{ .Content }} | |
| {{- else if eq .Role "system" }} | |
| {{- "\nSystem: " }}{{ .Content }} | |
| {{- end }} | |
| {{- end }} | |
| {{- "\nAssistant:" }} | |
| {{- else }} | |
| {{- range $i, $_ := .Messages }} | |
| {{- if eq .Role "user" }} | |
| {{- "\nUser: " }}{{ .Content }} | |
| {{- else if eq .Role "assistant" }} | |
| {{- "\nAssistant: " }}{{ .Content }} | |
| {{- end }} | |
| {{- end }} | |
| {{- "\nAssistant:" }} | |
| {{- end }} | |
| {{- else }} | |
| {{- .Prompt }} | |
| {{- end }} | |
| """ | |
| STOP ["User:", "System:", "\n\n"] |