--- license: apache-2.0 --- # Model Card for Model ID ## Model Details This project fine-tunes Microsoft's Phi-2 language model using parameter-efficient fine-tuning (LoRA) on the Nemotron-Personas-India dataset. The model is loaded using 4-bit NF4 quantization through BitsAndBytes to reduce memory consumption while maintaining training and inference capability on limited hardware. ### Model Description - **Developed by:** Sachin Singh - **Model type:** Causal Language Model - **Base model:** Phi-2 - **Language(s):** English - **Quantization:** 4-bit NF4 (BitsAndBytes) - **Fine-tuning method:** LoRA (PEFT) - **Dataset:** NVIDIA Nemotron-Personas-India (`en_IN` split) ### Model Sources - **Base Model:** microsoft/phi-2 - **Dataset:** nvidia/Nemotron-Personas-India ### Direct Use This model is intended for: - Persona-conditioned text generation - Instruction-following experiments - Low-memory LLM deployment research - Quantization benchmarking - LoRA fine-tuning demonstrations - LLM performance analytics studies ### Downstream Use The fine-tuned model can serve as a foundation for: - Persona-based conversational agents - Lightweight chatbot deployments - LLM optimization research - Quantization and efficiency studies ### Out-of-Scope Use This model is not intended for: - Medical advice - Legal advice - Financial decision making - Safety-critical systems - High-risk automated decision systems ## Bias, Risks, and Limitations The model inherits limitations from: - The Phi-2 base model - The Nemotron-Personas-India dataset - Quantization-induced approximation errors - Limited fine-tuning duration Generated responses may contain inaccuracies, hallucinations, biases, or incomplete information. ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig import torch model_id = "microsoft/phi-2" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True ) tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, quantization_config=bnb_config, device_map="auto" ) ``` ## Training Details ### Training Data The model is fine-tuned using: - Dataset: `nvidia/Nemotron-Personas-India` - Split: `en_IN` - Sample Size: 5,000 records Persona records are transformed into instruction-response training examples before fine-tuning. #### Training Hyperparameters - Fine-tuning Method: LoRA - Quantization: 4-bit NF4 - Epochs: 1 - Compute Type: FP16 - Double Quantization: Enabled #### Summary The project evaluates the trade-offs between model efficiency and generation capability when applying 4-bit quantization and LoRA fine-tuning to Phi-2. ### Model Architecture and Objective - Architecture: Phi-2 Transformer - Objective: Causal Language Modeling - Adaptation Method: LoRA - Quantization Method: BitsAndBytes NF4 4-bit Quantization ### Compute Infrastructure GPU T4 x2 ## Citation [optional] ```bibtex @misc{phi2, title={Phi-2: The surprising power of small language models}, author={Microsoft Research} } ``` ### Dataset ```bibtex @misc{nemotron_personas_india, title={Nemotron Personas India Dataset}, author={NVIDIA} } ``` ## Model Card Authors Sachin Singh ## Model in Notebook [[More Information Needed]](https://www.kaggle.com/code/shreyasraghav/4-bit-quantization-with-phi-2-with-more-analytics)