| | --- |
| | language: |
| | - en |
| | license: mit |
| | base_model: Qwen/Qwen2.5-Coder-7B |
| | tags: |
| | - zenith |
| | - tenstorrent |
| | - code |
| | - reasoning |
| | - moe |
| | - ring-attention |
| | - eq-adapter |
| | - matrix-corp |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | model_type: zenith |
| | hardware: |
| | - tenstorrent-blackhole-p300a |
| | --- |
| | |
| | # Zenith-7B V1 |
| |
|
| | Standard GPU-optimized language model with code generation and emotional intelligence capabilities. |
| |
|
| | ## Features |
| |
|
| | - **7B Parameter Model**: Efficient for consumer GPUs (8-16GB VRAM) |
| | - **Code Generation**: Fine-tuned on Qwen2.5-Coder base for exceptional programming abilities |
| | - **Emotional Intelligence**: EQ adapter for recognizing and responding to emotions |
| | - **OpenThoughts Integration**: Trained on high-quality reasoning data |
| | - **LoRA/QLoRA Support**: Efficient fine-tuning with 4-bit quantization |
| | - **Ollama Compatible**: Ready-to-use Modelfile for easy deployment |
| |
|
| | ## Quick Start |
| |
|
| | ### Installation |
| |
|
| | ```bash |
| | # Clone and setup |
| | cd Zenith/V1/7B |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | ### Training |
| |
|
| | ```bash |
| | # Full fine-tuning |
| | python train.py \ |
| | --base_model Qwen/Qwen2.5-Coder-7B \ |
| | --train_data path/to/train.json \ |
| | --epochs 3 \ |
| | --batch_size 4 \ |
| | --learning_rate 2e-5 |
| | |
| | # LoRA fine-tuning (recommended for most users) |
| | python train.py \ |
| | --base_model Qwen/Qwen2.5-Coder-7B \ |
| | --train_data path/to/train.json \ |
| | --use_lora \ |
| | --lora_r 16 \ |
| | --lora_alpha 32 \ |
| | --epochs 3 \ |
| | --batch_size 8 |
| | ``` |
| |
|
| | ### Inference |
| |
|
| | ```bash |
| | # Interactive mode |
| | python inference.py --checkpoint ./outputs/checkpoint-final |
| | |
| | # Single prompt |
| | python inference.py \ |
| | --checkpoint ./outputs/checkpoint-final \ |
| | --prompt "Write a Python function to reverse a linked list" \ |
| | --max_new_tokens 512 |
| | ``` |
| |
|
| | ### Ollama Deployment |
| |
|
| | ```bash |
| | # Build and run with Ollama |
| | ollama create zenith-7b -f Modelfile |
| | ollama run zenith-7b "Explain quantum computing in simple terms" |
| | ``` |
| |
|
| | ## Project Structure |
| |
|
| | ``` |
| | Zenith/V1/7B/ |
| | ├── configs/ # Configuration files |
| | │ ├── zenith_config.py # Model architecture config |
| | │ ├── data_config.py # Data processing config |
| | │ └── training_config.py # Training hyperparameters |
| | ├── data/ # Data processing modules |
| | │ ├── openthoughts_processor.py |
| | │ ├── quality_filter.py |
| | │ ├── curriculum_sampler.py |
| | │ ├── advanced_tokenizer.py |
| | │ └── preprocessing.py |
| | ├── src/ # Source code |
| | │ ├── models/ |
| | │ │ ├── zenith_model.py |
| | │ │ ├── dense_layer.py |
| | │ │ └── moe_layer.py |
| | │ └── utils/ |
| | ├── scripts/ # Utility scripts |
| | ├── tests/ # Test suite |
| | ├── train.py # Main training script |
| | ├── inference.py # Inference and generation |
| | ├── test_model.py # Model validation tests |
| | ├── finetune_qwen.py # Qwen fine-tuning guide |
| | ├── Modelfile # Ollama configuration |
| | ├── requirements.txt # Python dependencies |
| | └── README.md # This file |
| | ``` |
| |
|
| | ## Configuration |
| |
|
| | The model uses a unified configuration system in `configs/zenith_config.py`: |
| |
|
| | ```python |
| | from configs.zenith_config import get_7b_config |
| | |
| | config = get_7b_config() |
| | # Parameters: |
| | # - hidden_size: 4096 |
| | # - num_layers: 32 |
| | # - num_heads: 32 |
| | # - num_experts: 0 (dense only, set >1 for MoE) |
| | # - use_eq_adapter: True (emotional intelligence) |
| | # - max_seq_len: 8192 |
| | ``` |
| |
|
| | ## Data Processing |
| |
|
| | ### OpenThoughts Integration |
| |
|
| | The data pipeline supports the OpenThoughts-1.2M dataset: |
| |
|
| | ```python |
| | from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig |
| | |
| | config = OpenThoughtsConfig( |
| | dataset_name="open-thoughts/OpenThoughts3-1.2M", |
| | streaming=True, |
| | quality_filtering=True, |
| | curriculum_learning=True, |
| | augmentation=True |
| | ) |
| | processor = OpenThoughtsProcessor(config) |
| | dataset = processor.load_dataset() |
| | ``` |
| |
|
| | ### Quality Filtering |
| |
|
| | Multi-dimensional quality assessment: |
| | - Length appropriateness |
| | - Language detection (English only) |
| | - Repetition detection |
| | - Coherence scoring |
| | - Structure validation |
| | - Thought quality (for CoT data) |
| |
|
| | ### Curriculum Learning |
| |
|
| | Progressive training stages: |
| | 1. **Foundation**: High-quality, well-structured samples |
| | 2. **Reasoning**: Chain-of-thought and problem-solving |
| | 3. **Code**: Programming and technical content |
| | 4. **Full**: Complete dataset with all samples |
| |
|
| | ## Advanced Features |
| |
|
| | ### MoE (Mixture of Experts) |
| |
|
| | Enable sparse activation for better performance: |
| |
|
| | ```bash |
| | python train.py --use_moe --num_experts 8 |
| | ``` |
| |
|
| | - Top-2 routing with load balancing |
| | - 60% of layers use MoE (middle layers) |
| | - Shared router groups for efficiency |
| |
|
| | ### EQ Adapter |
| |
|
| | Emotional intelligence module: |
| |
|
| | ```bash |
| | python train.py --use_eq_adapter --eq_loss_weight 0.1 |
| | ``` |
| |
|
| | - Frustration detection (regression) |
| | - 8-emotion classification |
| | - Fused with attention mechanism |
| |
|
| | ### LoRA/QLoRA |
| |
|
| | Efficient fine-tuning with low-rank adaptation: |
| |
|
| | ```bash |
| | # LoRA |
| | python train.py --use_lora --lora_r 16 --lora_alpha 32 |
| | |
| | # QLoRA (4-bit quantization) |
| | python train.py --use_qlora --use_lora --lora_r 8 |
| | ``` |
| |
|
| | ## Testing |
| |
|
| | Run the test suite: |
| |
|
| | ```bash |
| | python test_model.py |
| | ``` |
| |
|
| | Tests include: |
| | - Model creation and initialization |
| | - Forward pass and gradient flow |
| | - Text generation |
| | - Multi-task outputs (EQ adapter) |
| | - Loss computation |
| |
|
| | ## Requirements |
| |
|
| | See `requirements.txt` for full dependencies. Key packages: |
| |
|
| | - torch>=2.0.0 |
| | - transformers>=4.35.0 |
| | - datasets>=2.14.0 |
| | - accelerate>=0.24.0 |
| | - peft>=0.6.0 (for LoRA) |
| | - bitsandbytes>=0.41.0 (for QLoRA) |
| | - tensorboard>=2.14.0 |
| |
|
| | ## Performance Tips |
| |
|
| | 1. **Mixed Precision**: Use `--mixed_precision bf16` for faster training (Ampere+ GPUs) |
| | 2. **Gradient Checkpointing**: Enabled by default to reduce memory |
| | 3. **Batch Size**: Adjust based on VRAM (4-8 for 7B full, 16-32 for LoRA) |
| | 4. **Sequence Length**: Longer sequences use more memory; adjust `--max_seq_length` |
| |
|
| | ## Troubleshooting |
| |
|
| | ### Out of Memory |
| | - Reduce batch size |
| | - Use gradient accumulation |
| | - Enable LoRA/QLoRA |
| | - Use mixed precision |
| | - Reduce sequence length |
| |
|
| | ### Slow Training |
| | - Increase batch size if possible |
| | - Use more gradient accumulation steps |
| | - Ensure data loading is not the bottleneck |
| | - Use mixed precision |
| |
|
| | ### Poor Quality Outputs |
| | - Train longer (more epochs) |
| | - Use higher quality data |
| | - Adjust learning rate (try 1e-5 to 5e-5) |
| | - Enable curriculum learning |
| | - Use quality filtering |
| |
|
| | ## Citation |
| |
|
| | If you use Zenith-7B in your research, please cite: |
| |
|
| | ```bibtex |
| | @misc{zenith-7b-2025, |
| | title={Zenith-7B: A Hybrid MoE Model for Code and Emotional Intelligence}, |
| | year={2025}, |
| | publisher={Zenith Project} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | [Specify your license here] |
| |
|
| | ## Contact |
| |
|
| | For issues and questions, please open an issue on the project repository. |