Text Generation
Transformers
English
zenith
tenstorrent
reasoning
large-model
Mixture of Experts
ring-attention
eq-adapter
deepseek-r1
llama
matrix-corp
Instructions to use Matrix-Corp/Zenith-70b-p300-V1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Matrix-Corp/Zenith-70b-p300-V1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Matrix-Corp/Zenith-70b-p300-V1")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Matrix-Corp/Zenith-70b-p300-V1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Matrix-Corp/Zenith-70b-p300-V1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Matrix-Corp/Zenith-70b-p300-V1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Matrix-Corp/Zenith-70b-p300-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Matrix-Corp/Zenith-70b-p300-V1
- SGLang
How to use Matrix-Corp/Zenith-70b-p300-V1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Matrix-Corp/Zenith-70b-p300-V1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Matrix-Corp/Zenith-70b-p300-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Matrix-Corp/Zenith-70b-p300-V1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Matrix-Corp/Zenith-70b-p300-V1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Matrix-Corp/Zenith-70b-p300-V1 with Docker Model Runner:
docker model run hf.co/Matrix-Corp/Zenith-70b-p300-V1
| # Zenith-70B-p300 Model Configuration for Ollama | |
| # Tenstorrent p300a Optimized - V1-Tenstorrent-Blackhole-p300 | |
| # Based on DeepSeek-R1-Distill-Llama-70B | |
| FROM deepseek-ai/DeepSeek-R1-Distill-Llama-70B | |
| # System prompt for maximum capability | |
| SYSTEM """ | |
| You are Zenith-70B-p300, the flagship model in the Zenith family, optimized for Tenstorrent p300a hardware. | |
| Based on DeepSeek-R1-Distill-Llama-70B with Zenith's advanced features. | |
| Your capabilities include: | |
| - State-of-the-art reasoning and problem-solving | |
| - Advanced code generation across multiple languages | |
| - Complex mathematical and scientific analysis | |
| - Long-context understanding (32K tokens) | |
| - Emotional intelligence and nuanced communication | |
| - Multi-domain expertise | |
| When solving problems: | |
| 1. Think deeply and systematically | |
| 2. Break complex problems into manageable steps | |
| 3. Show your reasoning process | |
| 4. Consider multiple perspectives | |
| 5. Verify your conclusions | |
| When coding: | |
| - Write production-quality, well-documented code | |
| - Follow language-specific best practices | |
| - Include error handling and edge cases | |
| - Optimize for readability and performance | |
| - Add comprehensive tests when appropriate | |
| Always be accurate, helpful, and thoughtful in your responses. | |
| """ | |
| # Generation parameters optimized for quality | |
| PARAMETER temperature 0.55 | |
| PARAMETER top_p 0.88 | |
| PARAMETER top_k 45 | |
| PARAMETER repeat_penalty 1.08 | |
| PARAMETER num_predict 8192 | |
| # 32K context window | |
| PARAMETER num_ctx 32768 | |
| # Chat template for Llama format | |
| TEMPLATE """ | |
| {{- if .Messages }} | |
| {{- range $i, $_ := .Messages }} | |
| {{- if eq .Role "user" }} | |
| {{- "\nUser: " }}{{ .Content }} | |
| {{- else if eq .Role "assistant" }} | |
| {{- "\nAssistant: " }}{{ .Content }} | |
| {{- else if eq .Role "system" }} | |
| {{- "\nSystem: " }}{{ .Content }} | |
| {{- end }} | |
| {{- end }} | |
| {{- "\nAssistant:" }} | |
| {{- else }} | |
| {{- .Prompt }} | |
| {{- end }} | |
| """ | |
| STOP ["User:", "System:", "\n\n"] |