Spaces:
Paused
Paused
| title: Phi-4 Unsloth Training | |
| emoji: 🧠 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.17.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # Phi-4 Unsloth Optimized Training | |
| This space is dedicated to training Microsoft's Phi-4 model using Unsloth optimizations for enhanced performance and efficiency. The training process utilizes 4-bit quantization and advanced memory optimizations. | |
| ## Installation | |
| This Hugging Face Space automatically installs dependencies from requirements.txt. The following packages are included: | |
| ### Installation Process | |
| The project uses a single consolidated requirements file that maintains the proper installation order of dependencies: | |
| 1. **All Dependencies (requirements.txt)**: | |
| - Contains all required packages in the correct installation order | |
| - Install with: `pip install -r requirements.txt` | |
| - The file is organized with clear sections: | |
| - Base dependencies (installed first) | |
| - Main dependencies | |
| - Optional dependencies (commented out by default) | |
| 2. **Flash Attention** (Optional): | |
| - For faster attention computation | |
| - Install with: `pip install flash-attn==2.5.2 --no-build-isolation` | |
| - Or uncomment the flash-attn line in requirements.txt | |
| 3. **Automated Installation**: | |
| - For convenience, you can use the included script: | |
| - Basic install: `python install_requirements.py` | |
| - With flash-attn: `python install_requirements.py --flash` | |
| This approach simplifies dependency management while still maintaining proper installation order. | |
| ### Essential Dependencies | |
| - **unsloth** (>=2024.3): Required for optimized 4-bit training | |
| - **peft** (>=0.9.0): Required for parameter-efficient fine-tuning | |
| - **transformers** (>=4.36.0): For model architecture and tokenization | |
| - **einops**: Required by Unsloth for tensor manipulation | |
| - **sentencepiece**: Required for tokenization | |
| ### Optional Dependencies | |
| - **flash-attn**: Optional for faster attention computation (not included by default as it can cause build issues) | |
| ## Features | |
| - 4-bit quantization using Unsloth | |
| - Optimized training pipeline | |
| - Cognitive dataset integration | |
| - Advanced memory management | |
| - Gradient checkpointing | |
| - Sequential data processing | |
| ## Configuration Files | |
| - `transformers_config.json`: Model and training parameters | |
| - `hardware_config.json`: Hardware-specific optimizations | |
| - `dataset_config.json`: Dataset processing settings | |
| - `requirements.txt`: Required dependencies | |
| ## Training Process | |
| The training utilizes the following optimizations: | |
| - Unsloth's 4-bit quantization | |
| - Custom chat templates for Phi-4 | |
| - Paper-order preservation | |
| - Efficient memory usage | |
| - Gradient accumulation | |
| ## Dataset | |
| Training uses the cognitive dataset with: | |
| - Maintained paper order | |
| - Proper metadata handling | |
| - Optimized sequence length | |
| - Efficient batching | |
| ## Hardware Requirements | |
| - GPU: A10G or better | |
| - VRAM: 24GB minimum | |
| - RAM: 32GB recommended | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # Phase 1: Domain Adaptation (Unsupervised) | |
| This directory contains the code and configuration for domain adaptation of the phi-4-unsloth-bnb-4bit model to the cognitive science domain. This phase produces our domain-adapted model: [George-API/phi-4-research-assistant](https://huggingface.co/George-API/phi-4-research-assistant). | |
| ## Overview | |
| Domain adaptation is the first phase of our training process, where we expose the model to a large corpus of cognitive science texts to help it learn domain-specific vocabulary, concepts, and patterns. This phase prepares the model for the more focused supervised fine-tuning in Phase 2. | |
| ## Files | |
| ### Core Training Files | |
| - `run_transformers_training.py`: Main script for domain adaptation | |
| - `transformers_config.json`: Model and training parameters | |
| - `hardware_config.json`: Hardware-specific optimizations | |
| - `dataset_config.json`: Dataset loading and processing settings | |
| - `requirements.txt`: Required Python packages | |
| ### Analysis & Utilities | |
| - `check_tokenization.py`: Script to analyze token distributions | |
| - `update_space.py`: Hugging Face Space update utility | |
| - `.env`: Environment variables (API tokens, etc.) | |
| ## Setup | |
| 1. **Environment Setup**: | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # or `venv\Scripts\activate` on Windows | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Environment Variables**: | |
| Create `.env` file with: | |
| ``` | |
| HUGGINGFACE_TOKEN=your_token_here | |
| ``` | |
| 3. **Verify Setup**: | |
| ```bash | |
| python check_tokenization.py # Ensures tokenizer works | |
| ``` | |
| ## How It Works | |
| 1. **Data Loading**: Loads pre-tokenized data from the Hugging Face dataset | |
| 2. **Sequential Processing**: Processes data in order, maintaining the integrity of research papers | |
| 3. **Efficient Training**: Uses pre-quantized Unsloth 4-bit model for memory-efficient and faster training | |
| 4. **Checkpointing**: Saves regular checkpoints and pushes to Hub | |
| 5. **Monitoring**: Logs detailed metrics and statistics during training | |
| 6. **Model Publishing**: Pushes the trained model to Hugging Face Hub | |
| ## Key Features | |
| ### Memory-Efficient Training | |
| The training setup is optimized for A10G GPUs: | |
| - Uses pre-quantized 4-bit model (no additional quantization needed) | |
| - Gradient checkpointing for memory efficiency | |
| - Flash attention for faster training | |
| - bfloat16 mixed precision training | |
| - Optimized batch sizes for maximum throughput | |
| ### Sequential Processing | |
| The training script ensures that chunks from the same research paper are processed together by: | |
| - Sorting the dataset by ID | |
| - Using a SequentialSampler to maintain order | |
| - Processing chunks sequentially (average 1,673 tokens per chunk) | |
| ### Data Collator | |
| The `SimpleDataCollator` class: | |
| - Preserves pre-tokenized data format | |
| - Processes each entry independently | |
| - Provides detailed logging of processing statistics | |
| - Handles errors gracefully | |
| ### Checkpointing | |
| The training process saves checkpoints: | |
| - Every 200 steps | |
| - Pushes to Hub on every save | |
| - Maintains up to 5 recent checkpoints | |
| - Automatically resumes from the latest checkpoint if interrupted | |
| ## Hardware Requirements | |
| This training setup is optimized for: | |
| - 2x NVIDIA A10G GPUs (24GB VRAM each) | |
| - 92GB System RAM | |
| - CUDA 11.8 or higher | |
| Memory breakdown per GPU: | |
| - Model (4-bit): ~3.5GB | |
| - Optimizer states: ~1GB | |
| - Batch memory: ~2GB | |
| - Peak usage: 18-20GB | |
| - Safe headroom: 4-6GB | |
| ## Configuration | |
| Key parameters in `transformers_config.json`: | |
| - `model_name`: unsloth/phi-4-unsloth-bnb-4bit | |
| - `learning_rate`: 2e-5 | |
| - `num_train_epochs`: 3 | |
| - `per_device_train_batch_size`: 16 | |
| - `gradient_accumulation_steps`: 4 | |
| - `effective_batch_size`: 128 (16 * 4 * 2 GPUs) | |
| - `max_seq_length`: 2048 | |
| - `lr_scheduler_type`: "cosine" | |
| - `warmup_ratio`: 0.03 | |
| - `neftune_noise_alpha`: 5 | |
| The configuration is optimized for: | |
| - Maximum memory efficiency with pre-quantized model | |
| - Stable training with cosine learning rate schedule | |
| - Effective gradient updates with accumulation | |
| - Regular checkpointing and Hub updates | |
| ## Running Domain Adaptation | |
| To start domain adaptation: | |
| ```bash | |
| python run_transformers_training.py | |
| ``` | |
| The script will: | |
| 1. Load the pre-quantized model and dataset | |
| 2. Apply optimized training parameters | |
| 3. Process the data sequentially | |
| 4. Train the model for 3 epochs | |
| 5. Save and push checkpoints to Hub regularly | |
| ## Using the Model | |
| After training, you can use the domain-adapted model: | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| # Load the domain-adapted model | |
| model_name = "George-API/phi-4-research-assistant" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| model = AutoModelForCausalLM.from_pretrained(model_name, | |
| device_map="auto", | |
| torch_dtype="bfloat16") | |
| # Generate text | |
| input_text = "The hippocampus is involved in" | |
| inputs = tokenizer(input_text, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_length=100) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ## Chat Format Example | |
| Phi-4 works best with its native chat template: | |
| ```python | |
| from transformers import pipeline | |
| pipeline = pipeline( | |
| "text-generation", | |
| model="George-API/phi-4-research-assistant", | |
| model_kwargs={"torch_dtype": "bfloat16"}, | |
| device_map="auto", | |
| ) | |
| messages = [ | |
| {"role": "system", "content": "You are an expert in cognitive science."}, | |
| {"role": "user", "content": "Explain the role of the hippocampus in memory formation."}, | |
| ] | |
| outputs = pipeline(messages, max_new_tokens=256) | |
| print(outputs[0]["generated_text"]) | |
| ``` | |
| ## Expected Outcomes | |
| After domain adaptation, the model should: | |
| - Have a better understanding of cognitive science terminology | |
| - Show improved performance on domain-specific tasks | |
| - Be ready for supervised fine-tuning in Phase 2 | |
| ## Next Steps | |
| After completing domain adaptation: | |
| 1. Evaluate the model's performance on cognitive science texts | |
| 2. Proceed to Phase 2 (Supervised Fine-Tuning) | |
| 3. Use TensorBoard to analyze training metrics |