Zenith V1
Collection
All V1 models of Zenith series • 4 items • Updated
• 1
Tenstorrent p300a-optimized 28B parameter model with advanced reasoning capabilities.
cd Zenith/V1-Tenstorrent-Blackhole-p300/28B
pip install -r requirements.txt
# Full fine-tuning (requires p300 with all 64 cores)
python train.py \
--base_model Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled \
--train_data ./data/train.json \
--epochs 2 \
--batch_size 2 \
--gradient_accumulation_steps 16 \
--learning_rate 1e-5 \
--tensor_parallel_size 8 \
--pipeline_parallel_size 4 \
--use_noc_optimization \
--use_ring_attention \
--max_seq_length 32768 \
--mixed_precision bf16
# LoRA fine-tuning (recommended)
python train.py \
--base_model Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled \
--train_data ./data/train.json \
--use_lora \
--lora_r 16 \
--lora_alpha 32 \
--epochs 3 \
--batch_size 4 \
--gradient_accumulation_steps 8 \
--learning_rate 1e-4 \
--use_ring_attention \
--max_seq_length 32768
# Interactive mode with reasoning focus
python inference.py --checkpoint ./outputs/checkpoint-final
# Single prompt with long context
python inference.py \
--checkpoint ./outputs/checkpoint-final \
--prompt "Analyze the following 10K text and extract key insights..." \
--max_new_tokens 2048 \
--temperature 0.55
# Build model
ollama create zenith-28b-p300 -f Modelfile
# Run with reasoning focus
ollama run zenith-28b-p300 "Solve: A train travels at 60 mph for 2.5 hours. How far does it go? Show your reasoning."
# Long context example
ollama run zenith-28b-p300 "Summarize the key points from this document: [paste 30K text]"
from configs.zenith_config import get_28b_p300_config
config = get_28b_p300_config()
print(config)
Key Parameters:
hidden_size: 3072num_layers: 36num_heads: 24num_experts: 8 (configurable, set 0 for dense)moe_top_k: 2max_seq_len: 32768use_ring_attention: Truering_attention_chunk_size: 8192ring_attention_overlap: 2048config.num_experts = 8
config.moe_top_k = 2
config.moe_load_balancing_weight = 0.01
config.moe_capacity_factor = 1.0
config.moe_router_learning_rate = 1e-3
config.use_eq_adapter = True
config.eq_adapter_hidden_size = 64
config.eq_loss_weight = 0.05
config.emotion_loss_weight = 0.05
config.frustration_loss_weight = 0.05
from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig
ot_config = OpenThoughtsConfig(
dataset_name="open-thoughts/OpenThoughts3-1.2M",
streaming=True,
max_seq_length=32768,
quality_filtering=True,
curriculum_learning=True,
augmentation=False, # Disabled for reasoning tasks
tokenizer=tokenizer
)
processor = OpenThoughtsProcessor(ot_config)
Multi-metric scoring:
Enables 32K context without quadratic memory:
config.use_ring_attention = True
config.ring_attention_chunk_size = 8192 # 8K chunks
config.ring_attention_overlap = 2048 # 2K overlap for continuity
For multi-chip p300 setups:
# Set environment variables for distributed training
export MASTER_ADDR=localhost
export MASTER_PORT=29500
export WORLD_SIZE=2 # 2 chips
# Run training with torchrun
torchrun --nproc_per_node=2 --nnodes=1 train.py ...
--mixed_precision bf16 # Best for Ampere+ (p300)
# or
--mixed_precision fp16 # For older GPUs
--gradient_checkpointing # Reduces memory by ~60%
# Run test suite
python test_model.py
# Includes:
# - Model creation
# - Forward pass
# - p300 optimizations
# - MoE configuration
# - Ring attention
# - EQ adapter
# - Generation
# - Gradient flow
python -m evaluation.benchmark \
--model_path ./outputs/checkpoint-final \
--benchmarks humaneval mbpp gsm8k math truthfulqa \
--output_dir ./eval_results
Create custom evaluation script:
from evaluation.eval_datasets import load_benchmark
from evaluation.metrics import compute_metrics
# Load reasoning benchmark
test_data = load_benchmark("gsm8k")
# Generate predictions
predictions = []
for sample in test_data:
prompt = f"Question: {sample['question']}\nLet's think step by step.\nAnswer:"
response = generate(model, tokenizer, prompt, max_new_tokens=512)
predictions.append(response)
# Evaluate
metrics = compute_metrics(predictions, test_data, task="gsm8k")
print(f"Accuracy: {metrics['accuracy']:.2%}")
--max_seq_length appropriate for your task--use_noc_optimization for chip-to-chip comms--batch_size to 1--gradient_accumulation_steps--use_qlora for 4-bit quantization--max_seq_length (try 16384 instead of 32768)--gradient_checkpointing--use_curriculum)--use_quality_filter)--max_seq_length is divisible by --ring_chunk_size--ring_overlap if memory is tight@misc{zenith-28b-p300-2025,
title={Zenith-28B-p300: A Tenstorrent-Optimized Reasoning Model with Ring Attention},
author={Zenith Project},
year={2025},
publisher={Zenith Project}
}
[Specify license]
README.mdFINETUNE_GUIDE.mdconfigs/zenith_config.pyBase model
Qwen/Qwen3.5-27B