---
license: apache-2.0
---

# Model Card for Model ID

## Model Details

This project fine-tunes Microsoft's Phi-2 language model using parameter-efficient fine-tuning (LoRA) on the Nemotron-Personas-India dataset. The model is loaded using 4-bit NF4 quantization through BitsAndBytes to reduce memory consumption while maintaining training and inference capability on limited hardware.

### Model Description


- **Developed by:** Sachin Singh
- **Model type:** Causal Language Model
- **Base model:** Phi-2
- **Language(s):** English
- **Quantization:** 4-bit NF4 (BitsAndBytes)
- **Fine-tuning method:** LoRA (PEFT)
- **Dataset:** NVIDIA Nemotron-Personas-India (`en_IN` split)

### Model Sources

- **Base Model:** microsoft/phi-2
- **Dataset:** nvidia/Nemotron-Personas-India


### Direct Use

This model is intended for:

- Persona-conditioned text generation
- Instruction-following experiments
- Low-memory LLM deployment research
- Quantization benchmarking
- LoRA fine-tuning demonstrations
- LLM performance analytics studies

### Downstream Use

The fine-tuned model can serve as a foundation for:

- Persona-based conversational agents
- Lightweight chatbot deployments
- LLM optimization research
- Quantization and efficiency studies

### Out-of-Scope Use

This model is not intended for:

- Medical advice
- Legal advice
- Financial decision making
- Safety-critical systems
- High-risk automated decision systems

## Bias, Risks, and Limitations

The model inherits limitations from:

- The Phi-2 base model
- The Nemotron-Personas-India dataset
- Quantization-induced approximation errors
- Limited fine-tuning duration

Generated responses may contain inaccuracies, hallucinations, biases, or incomplete information.


## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "microsoft/phi-2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)
```

## Training Details

### Training Data

The model is fine-tuned using:

- Dataset: `nvidia/Nemotron-Personas-India`
- Split: `en_IN`
- Sample Size: 5,000 records

Persona records are transformed into instruction-response training examples before fine-tuning.


#### Training Hyperparameters

- Fine-tuning Method: LoRA
- Quantization: 4-bit NF4
- Epochs: 1
- Compute Type: FP16
- Double Quantization: Enabled


#### Summary

The project evaluates the trade-offs between model efficiency and generation capability when applying 4-bit quantization and LoRA fine-tuning to Phi-2.


### Model Architecture and Objective

- Architecture: Phi-2 Transformer
- Objective: Causal Language Modeling
- Adaptation Method: LoRA
- Quantization Method: BitsAndBytes NF4 4-bit Quantization

### Compute Infrastructure

GPU T4 x2


## Citation [optional]

```bibtex
@misc{phi2,
  title={Phi-2: The surprising power of small language models},
  author={Microsoft Research}
}
```

### Dataset

```bibtex
@misc{nemotron_personas_india,
  title={Nemotron Personas India Dataset},
  author={NVIDIA}
}
```

## Model Card Authors 

Sachin Singh

## Model in Notebook

[[More Information Needed]](https://www.kaggle.com/code/shreyasraghav/4-bit-quantization-with-phi-2-with-more-analytics)