| # Phi-2 QLoRA Fine-tuned Assistant | |
| This is a fine-tuned version of Microsoft's Phi-2 model using QLoRA (Quantized Low-Rank Adaptation). The model has been trained to provide helpful responses for various tasks including coding, writing, and general assistance. | |
| ## Model Details | |
| - **Base Model**: Microsoft Phi-2 (2.7B parameters) | |
| - **Fine-tuning Method**: QLoRA (4-bit quantization) | |
| - **Training Data**: Custom dataset focused on programming and professional communication | |
| - **Hardware Used**: NVIDIA RTX 4090 (24GB VRAM) | |
| ## Usage | |
| You can interact with the model through the Gradio interface by visiting the "Spaces" tab of this repository. | |
| ### Local Installation | |
| To run the model locally: | |
| 1. Clone this repository | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Run the Gradio app: | |
| ```bash | |
| python gradio_app.py | |
| ``` | |
| ### Parameters | |
| - **Max Length**: Controls the maximum length of the generated response (64-1024 tokens) | |
| - **Temperature**: Controls randomness in generation (0.1-1.0) | |
| - **Top P**: Controls diversity of generated responses (0.1-1.0) | |
| ## Example Prompts | |
| 1. "Write a Python function to calculate the factorial of a number" | |
| 2. "Explain the concept of machine learning in simple terms" | |
| 3. "Write a professional email requesting a meeting with a client" | |
| ## Limitations | |
| - The model works best with English language input | |
| - Response quality may vary depending on the complexity of the prompt | |
| - Maximum context length is limited to 2048 tokens | |
| ## License | |
| This model is subject to the Microsoft Phi-2 license terms and conditions. | |
| ## Acknowledgments | |
| - Microsoft for the Phi-2 base model | |
| - Hugging Face for the transformers library and model hosting | |
| - The QLoRA paper authors for the quantization technique |