Instructions to use harshagale/llm-upload with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use harshagale/llm-upload with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("NousResearch/Llama-2-7b-chat-hf") model = PeftModel.from_pretrained(base_model, "harshagale/llm-upload") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: NousResearch/Llama-2-7b-chat-hf | |
| tags: | |
| - loRA | |
| - qloRA | |
| - peft | |
| - causal-lm | |
| - text-generation | |
| - fine-tuned | |
| datasets: | |
| - mlabonne/guanaco-llama2-1k | |
| pipeline_tag: text-generation | |
| language: | |
| - en | |
| # Llama-2-7b-chat-hf Fine-Tuned with QLoRA | |
| This model is a fine-tuned version of `NousResearch/Llama-2-7b-chat-hf` using Parameter-Efficient Fine-Tuning (PEFT) via **QLoRA** (4-bit quantization). It was trained on the `mlabonne/guanaco-llama2-1k` dataset. | |
| > **Note:** This repository contains **only the adapter weights**. To use this model, you need to load the base model (`NousResearch/Llama-2-7b-chat-hf`) and apply these LoRA adapters on top of it. | |
| ## Model Details | |
| - **Developed by:** Harsh Agale | |
| - **Base Model:** `NousResearch/Llama-2-7b-chat-hf` | |
| - **Method:** QLoRA (4-bit Quantization + LoRA) | |
| - **Language(s):** English | |
| - **License:** Apache 2.0 | |
| - **Task:** Causal Language Modeling / Text Generation | |
| ## Training Hyperparameters | |
| The model was trained using the following configuration: | |
| * **Quantization:** 4-bit NormalFloat (`nf4`) with double quantization | |
| * **Compute Dtype:** `float16` | |
| * **LoRA Rank (r):** 8 | |
| * **LoRA Alpha:** 16 | |
| * **Target Modules:** `q_proj`, `v_proj` | |
| * **LoRA Dropout:** 0.05 | |
| * **Learning Rate:** 2e-4 | |
| * **Optimizer:** `paged_adamw_8bit` | |
| * **Batch Size:** 1 (with 4 Gradient Accumulation Steps) | |
| * **Epochs:** 1 | |
| ## Project Purpose | |
| This project was created to learn and experiment with: | |
| - QLoRA fine-tuning | |
| - PEFT adapters | |
| - 4-bit quantization | |
| - Efficient LLM training | |
| - Hugging Face ecosystem | |
| ## Limitations | |
| - Trained on a small dataset | |
| - May produce hallucinated responses | |
| - Intended for educational and research purposes | |
| ## How to Load and Use This Model | |
| You can easily load this model and its adapters using the `transformers` and `peft` libraries: | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig | |
| from peft import PeftModel | |
| model_id = "NousResearch/Llama-2-7b-chat-hf" | |
| adapter_id = "harshagale/llm-upload" | |
| # 1. You must use the same 4-bit config to load the base model | |
| bnb_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_quant_type="nf4", | |
| bnb_4bit_compute_dtype=torch.float16, | |
| bnb_4bit_use_double_quant=True | |
| ) | |
| # 2. Load the base tokenizer and configure the padding token | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| tokenizer.pad_token = tokenizer.eos_token | |
| # 3. Load the quantized base model | |
| base_model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| quantization_config=bnb_config, | |
| device_map="auto" | |
| ) | |
| # 4. Merge the PEFT adapter weights onto the base model | |
| model = PeftModel.from_pretrained(base_model, adapter_id) | |
| # 5. Quick inference test | |
| prompt = "Human: Tell me a joke.\nAssistant:" | |
| inputs = tokenizer(prompt, return_tensors="pt").to("cuda") | |
| with torch.no_grad(): | |
| outputs = model.generate(**inputs, max_new_tokens=50) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |