Update README.md

2eb1da9 verified 8 days ago

6.75 kB

	---
	language:
	- en
	license: mit
	base_model: Qwen/Qwen2.5-Coder-7B
	tags:
	- zenith
	- tenstorrent
	- code
	- reasoning
	- moe
	- ring-attention
	- eq-adapter
	- matrix-corp
	pipeline_tag: text-generation
	library_name: transformers
	model_type: zenith
	hardware:
	- tenstorrent-blackhole-p300a
	---

	# Zenith-7B V1

	Standard GPU-optimized language model with code generation and emotional intelligence capabilities.

	## Features

	- 7B Parameter Model: Efficient for consumer GPUs (8-16GB VRAM)
	- Code Generation: Fine-tuned on Qwen2.5-Coder base for exceptional programming abilities
	- Emotional Intelligence: EQ adapter for recognizing and responding to emotions
	- OpenThoughts Integration: Trained on high-quality reasoning data
	- LoRA/QLoRA Support: Efficient fine-tuning with 4-bit quantization
	- Ollama Compatible: Ready-to-use Modelfile for easy deployment

	## Quick Start

	### Installation

	```bash
	# Clone and setup
	cd Zenith/V1/7B
	pip install -r requirements.txt
	```

	### Training

	```bash
	# Full fine-tuning
	python train.py \
	--base_model Qwen/Qwen2.5-Coder-7B \
	--train_data path/to/train.json \
	--epochs 3 \
	--batch_size 4 \
	--learning_rate 2e-5

	# LoRA fine-tuning (recommended for most users)
	python train.py \
	--base_model Qwen/Qwen2.5-Coder-7B \
	--train_data path/to/train.json \
	--use_lora \
	--lora_r 16 \
	--lora_alpha 32 \
	--epochs 3 \
	--batch_size 8
	```

	### Inference

	```bash
	# Interactive mode
	python inference.py --checkpoint ./outputs/checkpoint-final

	# Single prompt
	python inference.py \
	--checkpoint ./outputs/checkpoint-final \
	--prompt "Write a Python function to reverse a linked list" \
	--max_new_tokens 512
	```

	### Ollama Deployment

	```bash
	# Build and run with Ollama
	ollama create zenith-7b -f Modelfile
	ollama run zenith-7b "Explain quantum computing in simple terms"
	```

	## Project Structure

	```
	Zenith/V1/7B/
	├── configs/ # Configuration files
	│ ├── zenith_config.py # Model architecture config
	│ ├── data_config.py # Data processing config
	│ └── training_config.py # Training hyperparameters
	├── data/ # Data processing modules
	│ ├── openthoughts_processor.py
	│ ├── quality_filter.py
	│ ├── curriculum_sampler.py
	│ ├── advanced_tokenizer.py
	│ └── preprocessing.py
	├── src/ # Source code
	│ ├── models/
	│ │ ├── zenith_model.py
	│ │ ├── dense_layer.py
	│ │ └── moe_layer.py
	│ └── utils/
	├── scripts/ # Utility scripts
	├── tests/ # Test suite
	├── train.py # Main training script
	├── inference.py # Inference and generation
	├── test_model.py # Model validation tests
	├── finetune_qwen.py # Qwen fine-tuning guide
	├── Modelfile # Ollama configuration
	├── requirements.txt # Python dependencies
	└── README.md # This file
	```

	## Configuration

	The model uses a unified configuration system in `configs/zenith_config.py`:

	```python
	from configs.zenith_config import get_7b_config

	config = get_7b_config()
	# Parameters:
	# - hidden_size: 4096
	# - num_layers: 32
	# - num_heads: 32
	# - num_experts: 0 (dense only, set >1 for MoE)
	# - use_eq_adapter: True (emotional intelligence)
	# - max_seq_len: 8192
	```

	## Data Processing

	### OpenThoughts Integration

	The data pipeline supports the OpenThoughts-1.2M dataset:

	```python
	from data.openthoughts_processor import OpenThoughtsProcessor, OpenThoughtsConfig

	config = OpenThoughtsConfig(
	dataset_name="open-thoughts/OpenThoughts3-1.2M",
	streaming=True,
	quality_filtering=True,
	curriculum_learning=True,
	augmentation=True
	)
	processor = OpenThoughtsProcessor(config)
	dataset = processor.load_dataset()
	```

	### Quality Filtering

	Multi-dimensional quality assessment:
	- Length appropriateness
	- Language detection (English only)
	- Repetition detection
	- Coherence scoring
	- Structure validation
	- Thought quality (for CoT data)

	### Curriculum Learning

	Progressive training stages:
	1. Foundation: High-quality, well-structured samples
	2. Reasoning: Chain-of-thought and problem-solving
	3. Code: Programming and technical content
	4. Full: Complete dataset with all samples

	## Advanced Features

	### MoE (Mixture of Experts)

	Enable sparse activation for better performance:

	```bash
	python train.py --use_moe --num_experts 8
	```

	- Top-2 routing with load balancing
	- 60% of layers use MoE (middle layers)
	- Shared router groups for efficiency

	### EQ Adapter

	Emotional intelligence module:

	```bash
	python train.py --use_eq_adapter --eq_loss_weight 0.1
	```

	- Frustration detection (regression)
	- 8-emotion classification
	- Fused with attention mechanism

	### LoRA/QLoRA

	Efficient fine-tuning with low-rank adaptation:

	```bash
	# LoRA
	python train.py --use_lora --lora_r 16 --lora_alpha 32

	# QLoRA (4-bit quantization)
	python train.py --use_qlora --use_lora --lora_r 8
	```

	## Testing

	Run the test suite:

	```bash
	python test_model.py
	```

	Tests include:
	- Model creation and initialization
	- Forward pass and gradient flow
	- Text generation
	- Multi-task outputs (EQ adapter)
	- Loss computation

	## Requirements

	See `requirements.txt` for full dependencies. Key packages:

	- torch>=2.0.0
	- transformers>=4.35.0
	- datasets>=2.14.0
	- accelerate>=0.24.0
	- peft>=0.6.0 (for LoRA)
	- bitsandbytes>=0.41.0 (for QLoRA)
	- tensorboard>=2.14.0

	## Performance Tips

	1. Mixed Precision: Use `--mixed_precision bf16` for faster training (Ampere+ GPUs)
	2. Gradient Checkpointing: Enabled by default to reduce memory
	3. Batch Size: Adjust based on VRAM (4-8 for 7B full, 16-32 for LoRA)
	4. Sequence Length: Longer sequences use more memory; adjust `--max_seq_length`

	## Troubleshooting

	### Out of Memory
	- Reduce batch size
	- Use gradient accumulation
	- Enable LoRA/QLoRA
	- Use mixed precision
	- Reduce sequence length

	### Slow Training
	- Increase batch size if possible
	- Use more gradient accumulation steps
	- Ensure data loading is not the bottleneck
	- Use mixed precision

	### Poor Quality Outputs
	- Train longer (more epochs)
	- Use higher quality data
	- Adjust learning rate (try 1e-5 to 5e-5)
	- Enable curriculum learning
	- Use quality filtering

	## Citation

	If you use Zenith-7B in your research, please cite:

	```bibtex
	@misc{zenith-7b-2025,
	title={Zenith-7B: A Hybrid MoE Model for Code and Emotional Intelligence},
	year={2025},
	publisher={Zenith Project}
	}
	```

	## License

	[Specify your license here]

	## Contact

	For issues and questions, please open an issue on the project repository.