File size: 6,745 Bytes

---
language:
  - en
license: mit
base_model: Qwen/Qwen2.5-Coder-7B
tags:
  - zenith
  - tenstorrent
  - code
  - reasoning
  - moe
  - ring-attention
  - eq-adapter
  - matrix-corp
pipeline_tag: text-generation
library_name: transformers
model_type: zenith
hardware:
  - tenstorrent-blackhole-p300a
---

# Zenith-7B V1

Standard GPU-optimized language model with code generation and emotional intelligence capabilities.

## Features

- **7B Parameter Model**: Efficient for consumer GPUs (8-16GB VRAM)
- **Code Generation**: Fine-tuned on Qwen2.5-Coder base for exceptional programming abilities
- **Emotional Intelligence**: EQ adapter for recognizing and responding to emotions
- **OpenThoughts Integration**: Trained on high-quality reasoning data
- **LoRA/QLoRA Support**: Efficient fine-tuning with 4-bit quantization
- **Ollama Compatible**: Ready-to-use Modelfile for easy deployment

## Quick Start

### Installation

```bash
# Clone and setup
cd Zenith/V1/7B
pip install -r requirements.txt
```

### Training

```bash
# Full fine-tuning
python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data path/to/train.json \
  --epochs 3 \
  --batch_size 4 \
  --learning_rate 2e-5

# LoRA fine-tuning (recommended for most users)
python train.py \
  --base_model Qwen/Qwen2.5-Coder-7B \
  --train_data path/to/train.json \
  --use_lora \
  --lora_r 16 \
  --lora_alpha 32 \
  --epochs 3 \
  --batch_size 8
```

### Inference

```bash
# Interactive mode
python inference.py --checkpoint ./outputs/checkpoint-final

# Single prompt
python inference.py \
  --checkpoint ./outputs/checkpoint-final \
  --prompt "Write a Python function to reverse a linked list" \
  --max_new_tokens 512
```

### Ollama Deployment

```bash
# Build and run with Ollama
ollama create zenith-7b -f Modelfile
ollama run zenith-7b "Explain quantum computing in simple terms"
```

## Project Structure

```
Zenith/V1/7B/
├── configs/              # Configuration files
│   ├── zenith_config.py  # Model architecture config
│   ├── data_config.py    # Data processing config
│   └── training_config.py # Training hyperparameters
├── data/                 # Data processing modules
│   ├── openthoughts_processor.py
│   ├── quality_filter.py
│   ├── curriculum_sampler.py
│   ├── advanced_tokenizer.py
│   └── preprocessing.py
├── src/                  # Source code
│   ├── models/
│   │   ├── zenith_model.py
│   │   ├── dense_layer.py
│   │   └── moe_layer.py
│   └── utils/
├── scripts/              # Utility scripts
├── tests/                # Test suite
├── train.py              # Main training script
├── inference.py          # Inference and generation
├── test_model.py         # Model validation tests
├── finetune_qwen.py      # Qwen fine-tuning guide
├── Modelfile             # Ollama configuration
├── requirements.txt      # Python dependencies
└── README.md             # This file
```

## Configuration

The model uses a unified configuration system in `configs/zenith_config.py`:

```python
from configs.zenith_config import get_7b_config

config = get_7b_config()
# Parameters:
# - hidden_size: 4096
# - num_layers: 32
# - num_heads: 32
# - num_experts: 0 (dense only, set >1 for MoE)
# - use_eq_adapter: True (emotional intelligence)
# - max_seq_len: 8192
```

## Data Processing

### OpenThoughts Integration

The data pipeline supports the OpenThoughts-1.2M dataset:

```python
from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig

config = OpenThoughtsConfig(
    dataset_name="open-thoughts/OpenThoughts3-1.2M",
    streaming=True,
    quality_filtering=True,
    curriculum_learning=True,
    augmentation=True
)
processor = OpenThoughtsProcessor(config)
dataset = processor.load_dataset()
```

### Quality Filtering

Multi-dimensional quality assessment:
- Length appropriateness
- Language detection (English only)
- Repetition detection
- Coherence scoring
- Structure validation
- Thought quality (for CoT data)

### Curriculum Learning

Progressive training stages:
1. **Foundation**: High-quality, well-structured samples
2. **Reasoning**: Chain-of-thought and problem-solving
3. **Code**: Programming and technical content
4. **Full**: Complete dataset with all samples

## Advanced Features

### MoE (Mixture of Experts)

Enable sparse activation for better performance:

```bash
python train.py --use_moe --num_experts 8
```

- Top-2 routing with load balancing
- 60% of layers use MoE (middle layers)
- Shared router groups for efficiency

### EQ Adapter

Emotional intelligence module:

```bash
python train.py --use_eq_adapter --eq_loss_weight 0.1
```

- Frustration detection (regression)
- 8-emotion classification
- Fused with attention mechanism

### LoRA/QLoRA

Efficient fine-tuning with low-rank adaptation:

```bash
# LoRA
python train.py --use_lora --lora_r 16 --lora_alpha 32

# QLoRA (4-bit quantization)
python train.py --use_qlora --use_lora --lora_r 8
```

## Testing

Run the test suite:

```bash
python test_model.py
```

Tests include:
- Model creation and initialization
- Forward pass and gradient flow
- Text generation
- Multi-task outputs (EQ adapter)
- Loss computation

## Requirements

See `requirements.txt` for full dependencies. Key packages:

- torch>=2.0.0
- transformers>=4.35.0
- datasets>=2.14.0
- accelerate>=0.24.0
- peft>=0.6.0 (for LoRA)
- bitsandbytes>=0.41.0 (for QLoRA)
- tensorboard>=2.14.0

## Performance Tips

1. **Mixed Precision**: Use `--mixed_precision bf16` for faster training (Ampere+ GPUs)
2. **Gradient Checkpointing**: Enabled by default to reduce memory
3. **Batch Size**: Adjust based on VRAM (4-8 for 7B full, 16-32 for LoRA)
4. **Sequence Length**: Longer sequences use more memory; adjust `--max_seq_length`

## Troubleshooting

### Out of Memory
- Reduce batch size
- Use gradient accumulation
- Enable LoRA/QLoRA
- Use mixed precision
- Reduce sequence length

### Slow Training
- Increase batch size if possible
- Use more gradient accumulation steps
- Ensure data loading is not the bottleneck
- Use mixed precision

### Poor Quality Outputs
- Train longer (more epochs)
- Use higher quality data
- Adjust learning rate (try 1e-5 to 5e-5)
- Enable curriculum learning
- Use quality filtering

## Citation

If you use Zenith-7B in your research, please cite:

```bibtex
@misc{zenith-7b-2025,
  title={Zenith-7B: A Hybrid MoE Model for Code and Emotional Intelligence},
  year={2025},
  publisher={Zenith Project}
}
```

## License

[Specify your license here]

## Contact

For issues and questions, please open an issue on the project repository.