Zenith-7b-V1 / README.md

Zandy-Wandy

Update README.md

2eb1da9 verified 8 days ago

preview code

raw

history blame contribute delete

6.75 kB

metadata

language:
  - en
license: mit
base_model: Qwen/Qwen2.5-Coder-7B
tags:
  - zenith
  - tenstorrent
  - code
  - reasoning
  - moe
  - ring-attention
  - eq-adapter
  - matrix-corp
pipeline_tag: text-generation
library_name: transformers
model_type: zenith
hardware:
  - tenstorrent-blackhole-p300a

Zenith-7B V1

Standard GPU-optimized language model with code generation and emotional intelligence capabilities.

Features

7B Parameter Model: Efficient for consumer GPUs (8-16GB VRAM)
Code Generation: Fine-tuned on Qwen2.5-Coder base for exceptional programming abilities
Emotional Intelligence: EQ adapter for recognizing and responding to emotions
OpenThoughts Integration: Trained on high-quality reasoning data
LoRA/QLoRA Support: Efficient fine-tuning with 4-bit quantization
Ollama Compatible: Ready-to-use Modelfile for easy deployment

Quick Start

Installation

# Clone and setup
cd Zenith/V1/7B
pip install -r requirements.txt

Training

# Full fine-tuning
python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data path/to/train.json \
  --epochs 3 \
  --batch_size 4 \
  --learning_rate 2e-5

# LoRA fine-tuning (recommended for most users)
python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data path/to/train.json \
  --use_lora \
  --lora_r 16 \
  --lora_alpha 32 \
  --epochs 3 \
  --batch_size 8

Inference

# Interactive mode
python inference.py --checkpoint ./outputs/checkpoint-final

# Single prompt
python inference.py \
  --checkpoint ./outputs/checkpoint-final \
  --prompt "Write a Python function to reverse a linked list" \
  --max_new_tokens 512

Ollama Deployment

# Build and run with Ollama
ollama create zenith-7b -f Modelfile
ollama run zenith-7b "Explain quantum computing in simple terms"

Project Structure

Zenith/V1/7B/
├── configs/              # Configuration files
│   ├── zenith_config.py  # Model architecture config
│   ├── data_config.py    # Data processing config
│   └── training_config.py # Training hyperparameters
├── data/                 # Data processing modules
│   ├── openthoughts_processor.py
│   ├── quality_filter.py
│   ├── curriculum_sampler.py
│   ├── advanced_tokenizer.py
│   └── preprocessing.py
├── src/                  # Source code
│   ├── models/
│   │   ├── zenith_model.py
│   │   ├── dense_layer.py
│   │   └── moe_layer.py
│   └── utils/
├── scripts/              # Utility scripts
├── tests/                # Test suite
├── train.py              # Main training script
├── inference.py          # Inference and generation
├── test_model.py         # Model validation tests
├── finetune_qwen.py      # Qwen fine-tuning guide
├── Modelfile             # Ollama configuration
├── requirements.txt      # Python dependencies
└── README.md             # This file

Configuration

The model uses a unified configuration system in configs/zenith_config.py:

from configs.zenith_config import get_7b_config

config = get_7b_config()
# Parameters:
# - hidden_size: 4096
# - num_layers: 32
# - num_heads: 32
# - num_experts: 0 (dense only, set >1 for MoE)
# - use_eq_adapter: True (emotional intelligence)
# - max_seq_len: 8192

Data Processing

OpenThoughts Integration

The data pipeline supports the OpenThoughts-1.2M dataset:

from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig

config = OpenThoughtsConfig(
    dataset_name="open-thoughts/OpenThoughts3-1.2M",
    streaming=True,
    quality_filtering=True,
    curriculum_learning=True,
    augmentation=True
)
processor = OpenThoughtsProcessor(config)
dataset = processor.load_dataset()

Quality Filtering

Multi-dimensional quality assessment:

Length appropriateness
Language detection (English only)
Repetition detection
Coherence scoring
Structure validation
Thought quality (for CoT data)

Curriculum Learning

Progressive training stages:

Foundation: High-quality, well-structured samples
Reasoning: Chain-of-thought and problem-solving
Code: Programming and technical content
Full: Complete dataset with all samples

Advanced Features

MoE (Mixture of Experts)

Enable sparse activation for better performance:

python train.py --use_moe --num_experts 8

Top-2 routing with load balancing
60% of layers use MoE (middle layers)
Shared router groups for efficiency

EQ Adapter

Emotional intelligence module:

python train.py --use_eq_adapter --eq_loss_weight 0.1

Frustration detection (regression)
8-emotion classification
Fused with attention mechanism

LoRA/QLoRA

Efficient fine-tuning with low-rank adaptation:

# LoRA
python train.py --use_lora --lora_r 16 --lora_alpha 32

# QLoRA (4-bit quantization)
python train.py --use_qlora --use_lora --lora_r 8

Testing

Run the test suite:

python test_model.py

Tests include:

Model creation and initialization
Forward pass and gradient flow
Text generation
Multi-task outputs (EQ adapter)
Loss computation

Requirements

See requirements.txt for full dependencies. Key packages:

torch>=2.0.0
transformers>=4.35.0
datasets>=2.14.0
accelerate>=0.24.0
peft>=0.6.0 (for LoRA)
bitsandbytes>=0.41.0 (for QLoRA)
tensorboard>=2.14.0

Performance Tips

Mixed Precision: Use --mixed_precision bf16 for faster training (Ampere+ GPUs)
Gradient Checkpointing: Enabled by default to reduce memory
Batch Size: Adjust based on VRAM (4-8 for 7B full, 16-32 for LoRA)
Sequence Length: Longer sequences use more memory; adjust --max_seq_length

Troubleshooting

Out of Memory

Reduce batch size
Use gradient accumulation
Enable LoRA/QLoRA
Use mixed precision
Reduce sequence length

Slow Training

Increase batch size if possible
Use more gradient accumulation steps
Ensure data loading is not the bottleneck
Use mixed precision

Poor Quality Outputs

Train longer (more epochs)
Use higher quality data
Adjust learning rate (try 1e-5 to 5e-5)
Enable curriculum learning
Use quality filtering

Citation

If you use Zenith-7B in your research, please cite:

@misc{zenith-7b-2025,
  title={Zenith-7B: A Hybrid MoE Model for Code and Emotional Intelligence},
  year={2025},
  publisher={Zenith Project}
}

License

[Specify your license here]

Contact

For issues and questions, please open an issue on the project repository.