| --- |
| license: apache-2.0 |
| --- |
| |
| # Model Card for Model ID |
|
|
| ## Model Details |
|
|
| This project fine-tunes Microsoft's Phi-2 language model using parameter-efficient fine-tuning (LoRA) on the Nemotron-Personas-India dataset. The model is loaded using 4-bit NF4 quantization through BitsAndBytes to reduce memory consumption while maintaining training and inference capability on limited hardware. |
|
|
| ### Model Description |
|
|
|
|
| - **Developed by:** Sachin Singh |
| - **Model type:** Causal Language Model |
| - **Base model:** Phi-2 |
| - **Language(s):** English |
| - **Quantization:** 4-bit NF4 (BitsAndBytes) |
| - **Fine-tuning method:** LoRA (PEFT) |
| - **Dataset:** NVIDIA Nemotron-Personas-India (`en_IN` split) |
|
|
| ### Model Sources |
|
|
| - **Base Model:** microsoft/phi-2 |
| - **Dataset:** nvidia/Nemotron-Personas-India |
|
|
|
|
| ### Direct Use |
|
|
| This model is intended for: |
|
|
| - Persona-conditioned text generation |
| - Instruction-following experiments |
| - Low-memory LLM deployment research |
| - Quantization benchmarking |
| - LoRA fine-tuning demonstrations |
| - LLM performance analytics studies |
|
|
| ### Downstream Use |
|
|
| The fine-tuned model can serve as a foundation for: |
|
|
| - Persona-based conversational agents |
| - Lightweight chatbot deployments |
| - LLM optimization research |
| - Quantization and efficiency studies |
|
|
| ### Out-of-Scope Use |
|
|
| This model is not intended for: |
|
|
| - Medical advice |
| - Legal advice |
| - Financial decision making |
| - Safety-critical systems |
| - High-risk automated decision systems |
|
|
| ## Bias, Risks, and Limitations |
|
|
| The model inherits limitations from: |
|
|
| - The Phi-2 base model |
| - The Nemotron-Personas-India dataset |
| - Quantization-induced approximation errors |
| - Limited fine-tuning duration |
|
|
| Generated responses may contain inaccuracies, hallucinations, biases, or incomplete information. |
|
|
|
|
| ## How to Get Started with the Model |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
| import torch |
| |
| model_id = "microsoft/phi-2" |
| |
| bnb_config = BitsAndBytesConfig( |
| load_in_4bit=True, |
| bnb_4bit_quant_type="nf4", |
| bnb_4bit_compute_dtype=torch.float16, |
| bnb_4bit_use_double_quant=True |
| ) |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| quantization_config=bnb_config, |
| device_map="auto" |
| ) |
| ``` |
|
|
| ## Training Details |
|
|
| ### Training Data |
|
|
| The model is fine-tuned using: |
|
|
| - Dataset: `nvidia/Nemotron-Personas-India` |
| - Split: `en_IN` |
| - Sample Size: 5,000 records |
|
|
| Persona records are transformed into instruction-response training examples before fine-tuning. |
|
|
|
|
| #### Training Hyperparameters |
|
|
| - Fine-tuning Method: LoRA |
| - Quantization: 4-bit NF4 |
| - Epochs: 1 |
| - Compute Type: FP16 |
| - Double Quantization: Enabled |
|
|
|
|
| #### Summary |
|
|
| The project evaluates the trade-offs between model efficiency and generation capability when applying 4-bit quantization and LoRA fine-tuning to Phi-2. |
|
|
|
|
| ### Model Architecture and Objective |
|
|
| - Architecture: Phi-2 Transformer |
| - Objective: Causal Language Modeling |
| - Adaptation Method: LoRA |
| - Quantization Method: BitsAndBytes NF4 4-bit Quantization |
|
|
| ### Compute Infrastructure |
|
|
| GPU T4 x2 |
|
|
|
|
| ## Citation [optional] |
|
|
| ```bibtex |
| @misc{phi2, |
| title={Phi-2: The surprising power of small language models}, |
| author={Microsoft Research} |
| } |
| ``` |
|
|
| ### Dataset |
|
|
| ```bibtex |
| @misc{nemotron_personas_india, |
| title={Nemotron Personas India Dataset}, |
| author={NVIDIA} |
| } |
| ``` |
|
|
| ## Model Card Authors |
|
|
| Sachin Singh |
|
|
| ## Model in Notebook |
|
|
| [[More Information Needed]](https://www.kaggle.com/code/shreyasraghav/4-bit-quantization-with-phi-2-with-more-analytics) |