Zenith-7b-V1 / README.md
Zandy-Wandy's picture
Update README.md
2eb1da9 verified
metadata
language:
  - en
license: mit
base_model: Qwen/Qwen2.5-Coder-7B
tags:
  - zenith
  - tenstorrent
  - code
  - reasoning
  - moe
  - ring-attention
  - eq-adapter
  - matrix-corp
pipeline_tag: text-generation
library_name: transformers
model_type: zenith
hardware:
  - tenstorrent-blackhole-p300a

Zenith-7B V1

Standard GPU-optimized language model with code generation and emotional intelligence capabilities.

Features

  • 7B Parameter Model: Efficient for consumer GPUs (8-16GB VRAM)
  • Code Generation: Fine-tuned on Qwen2.5-Coder base for exceptional programming abilities
  • Emotional Intelligence: EQ adapter for recognizing and responding to emotions
  • OpenThoughts Integration: Trained on high-quality reasoning data
  • LoRA/QLoRA Support: Efficient fine-tuning with 4-bit quantization
  • Ollama Compatible: Ready-to-use Modelfile for easy deployment

Quick Start

Installation

# Clone and setup
cd Zenith/V1/7B
pip install -r requirements.txt

Training

# Full fine-tuning
python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data path/to/train.json \
  --epochs 3 \
  --batch_size 4 \
  --learning_rate 2e-5

# LoRA fine-tuning (recommended for most users)
python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data path/to/train.json \
  --use_lora \
  --lora_r 16 \
  --lora_alpha 32 \
  --epochs 3 \
  --batch_size 8

Inference

# Interactive mode
python inference.py --checkpoint ./outputs/checkpoint-final

# Single prompt
python inference.py \
  --checkpoint ./outputs/checkpoint-final \
  --prompt "Write a Python function to reverse a linked list" \
  --max_new_tokens 512

Ollama Deployment

# Build and run with Ollama
ollama create zenith-7b -f Modelfile
ollama run zenith-7b "Explain quantum computing in simple terms"

Project Structure

Zenith/V1/7B/
β”œβ”€β”€ configs/              # Configuration files
β”‚   β”œβ”€β”€ zenith_config.py  # Model architecture config
β”‚   β”œβ”€β”€ data_config.py    # Data processing config
β”‚   └── training_config.py # Training hyperparameters
β”œβ”€β”€ data/                 # Data processing modules
β”‚   β”œβ”€β”€ openthoughts_processor.py
β”‚   β”œβ”€β”€ quality_filter.py
β”‚   β”œβ”€β”€ curriculum_sampler.py
β”‚   β”œβ”€β”€ advanced_tokenizer.py
β”‚   └── preprocessing.py
β”œβ”€β”€ src/                  # Source code
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ zenith_model.py
β”‚   β”‚   β”œβ”€β”€ dense_layer.py
β”‚   β”‚   └── moe_layer.py
β”‚   └── utils/
β”œβ”€β”€ scripts/              # Utility scripts
β”œβ”€β”€ tests/                # Test suite
β”œβ”€β”€ train.py              # Main training script
β”œβ”€β”€ inference.py          # Inference and generation
β”œβ”€β”€ test_model.py         # Model validation tests
β”œβ”€β”€ finetune_qwen.py      # Qwen fine-tuning guide
β”œβ”€β”€ Modelfile             # Ollama configuration
β”œβ”€β”€ requirements.txt      # Python dependencies
└── README.md             # This file

Configuration

The model uses a unified configuration system in configs/zenith_config.py:

from configs.zenith_config import get_7b_config

config = get_7b_config()
# Parameters:
# - hidden_size: 4096
# - num_layers: 32
# - num_heads: 32
# - num_experts: 0 (dense only, set >1 for MoE)
# - use_eq_adapter: True (emotional intelligence)
# - max_seq_len: 8192

Data Processing

OpenThoughts Integration

The data pipeline supports the OpenThoughts-1.2M dataset:

from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig

config = OpenThoughtsConfig(
    dataset_name="open-thoughts/OpenThoughts3-1.2M",
    streaming=True,
    quality_filtering=True,
    curriculum_learning=True,
    augmentation=True
)
processor = OpenThoughtsProcessor(config)
dataset = processor.load_dataset()

Quality Filtering

Multi-dimensional quality assessment:

  • Length appropriateness
  • Language detection (English only)
  • Repetition detection
  • Coherence scoring
  • Structure validation
  • Thought quality (for CoT data)

Curriculum Learning

Progressive training stages:

  1. Foundation: High-quality, well-structured samples
  2. Reasoning: Chain-of-thought and problem-solving
  3. Code: Programming and technical content
  4. Full: Complete dataset with all samples

Advanced Features

MoE (Mixture of Experts)

Enable sparse activation for better performance:

python train.py --use_moe --num_experts 8
  • Top-2 routing with load balancing
  • 60% of layers use MoE (middle layers)
  • Shared router groups for efficiency

EQ Adapter

Emotional intelligence module:

python train.py --use_eq_adapter --eq_loss_weight 0.1
  • Frustration detection (regression)
  • 8-emotion classification
  • Fused with attention mechanism

LoRA/QLoRA

Efficient fine-tuning with low-rank adaptation:

# LoRA
python train.py --use_lora --lora_r 16 --lora_alpha 32

# QLoRA (4-bit quantization)
python train.py --use_qlora --use_lora --lora_r 8

Testing

Run the test suite:

python test_model.py

Tests include:

  • Model creation and initialization
  • Forward pass and gradient flow
  • Text generation
  • Multi-task outputs (EQ adapter)
  • Loss computation

Requirements

See requirements.txt for full dependencies. Key packages:

  • torch>=2.0.0
  • transformers>=4.35.0
  • datasets>=2.14.0
  • accelerate>=0.24.0
  • peft>=0.6.0 (for LoRA)
  • bitsandbytes>=0.41.0 (for QLoRA)
  • tensorboard>=2.14.0

Performance Tips

  1. Mixed Precision: Use --mixed_precision bf16 for faster training (Ampere+ GPUs)
  2. Gradient Checkpointing: Enabled by default to reduce memory
  3. Batch Size: Adjust based on VRAM (4-8 for 7B full, 16-32 for LoRA)
  4. Sequence Length: Longer sequences use more memory; adjust --max_seq_length

Troubleshooting

Out of Memory

  • Reduce batch size
  • Use gradient accumulation
  • Enable LoRA/QLoRA
  • Use mixed precision
  • Reduce sequence length

Slow Training

  • Increase batch size if possible
  • Use more gradient accumulation steps
  • Ensure data loading is not the bottleneck
  • Use mixed precision

Poor Quality Outputs

  • Train longer (more epochs)
  • Use higher quality data
  • Adjust learning rate (try 1e-5 to 5e-5)
  • Enable curriculum learning
  • Use quality filtering

Citation

If you use Zenith-7B in your research, please cite:

@misc{zenith-7b-2025,
  title={Zenith-7B: A Hybrid MoE Model for Code and Emotional Intelligence},
  year={2025},
  publisher={Zenith Project}
}

License

[Specify your license here]

Contact

For issues and questions, please open an issue on the project repository.