sachinssh's picture
Update README.md
c1a621f verified
|
Raw
History Blame Contribute Delete
3.52 kB
---
license: apache-2.0
---
# Model Card for Model ID
## Model Details
This project fine-tunes Microsoft's Phi-2 language model using parameter-efficient fine-tuning (LoRA) on the Nemotron-Personas-India dataset. The model is loaded using 4-bit NF4 quantization through BitsAndBytes to reduce memory consumption while maintaining training and inference capability on limited hardware.
### Model Description
- **Developed by:** Sachin Singh
- **Model type:** Causal Language Model
- **Base model:** Phi-2
- **Language(s):** English
- **Quantization:** 4-bit NF4 (BitsAndBytes)
- **Fine-tuning method:** LoRA (PEFT)
- **Dataset:** NVIDIA Nemotron-Personas-India (`en_IN` split)
### Model Sources
- **Base Model:** microsoft/phi-2
- **Dataset:** nvidia/Nemotron-Personas-India
### Direct Use
This model is intended for:
- Persona-conditioned text generation
- Instruction-following experiments
- Low-memory LLM deployment research
- Quantization benchmarking
- LoRA fine-tuning demonstrations
- LLM performance analytics studies
### Downstream Use
The fine-tuned model can serve as a foundation for:
- Persona-based conversational agents
- Lightweight chatbot deployments
- LLM optimization research
- Quantization and efficiency studies
### Out-of-Scope Use
This model is not intended for:
- Medical advice
- Legal advice
- Financial decision making
- Safety-critical systems
- High-risk automated decision systems
## Bias, Risks, and Limitations
The model inherits limitations from:
- The Phi-2 base model
- The Nemotron-Personas-India dataset
- Quantization-induced approximation errors
- Limited fine-tuning duration
Generated responses may contain inaccuracies, hallucinations, biases, or incomplete information.
## How to Get Started with the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
model_id = "microsoft/phi-2"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
```
## Training Details
### Training Data
The model is fine-tuned using:
- Dataset: `nvidia/Nemotron-Personas-India`
- Split: `en_IN`
- Sample Size: 5,000 records
Persona records are transformed into instruction-response training examples before fine-tuning.
#### Training Hyperparameters
- Fine-tuning Method: LoRA
- Quantization: 4-bit NF4
- Epochs: 1
- Compute Type: FP16
- Double Quantization: Enabled
#### Summary
The project evaluates the trade-offs between model efficiency and generation capability when applying 4-bit quantization and LoRA fine-tuning to Phi-2.
### Model Architecture and Objective
- Architecture: Phi-2 Transformer
- Objective: Causal Language Modeling
- Adaptation Method: LoRA
- Quantization Method: BitsAndBytes NF4 4-bit Quantization
### Compute Infrastructure
GPU T4 x2
## Citation [optional]
```bibtex
@misc{phi2,
title={Phi-2: The surprising power of small language models},
author={Microsoft Research}
}
```
### Dataset
```bibtex
@misc{nemotron_personas_india,
title={Nemotron Personas India Dataset},
author={NVIDIA}
}
```
## Model Card Authors
Sachin Singh
## Model in Notebook
[[More Information Needed]](https://www.kaggle.com/code/shreyasraghav/4-bit-quantization-with-phi-2-with-more-analytics)