File size: 3,521 Bytes
64e693b
 
 
ec423bf
 
 
 
 
c1a621f
ec423bf
c1a621f
ec423bf
 
c1a621f
 
 
 
 
 
 
ec423bf
c1a621f
ec423bf
c1a621f
 
ec423bf
 
 
 
c1a621f
ec423bf
c1a621f
 
 
 
 
 
ec423bf
c1a621f
ec423bf
c1a621f
ec423bf
c1a621f
 
 
 
ec423bf
 
 
c1a621f
ec423bf
c1a621f
 
 
 
 
ec423bf
 
 
c1a621f
ec423bf
c1a621f
 
 
 
ec423bf
c1a621f
ec423bf
 
 
 
c1a621f
 
 
ec423bf
c1a621f
ec423bf
c1a621f
 
 
 
 
 
ec423bf
c1a621f
ec423bf
c1a621f
 
 
 
 
 
ec423bf
c1a621f
ec423bf
c1a621f
ec423bf
c1a621f
ec423bf
c1a621f
 
 
ec423bf
c1a621f
ec423bf
 
 
 
c1a621f
 
 
 
 
ec423bf
 
 
 
c1a621f
ec423bf
 
 
 
c1a621f
 
 
 
ec423bf
 
 
c1a621f
ec423bf
 
 
 
c1a621f
 
 
 
 
 
ec423bf
c1a621f
ec423bf
c1a621f
 
 
 
 
 
ec423bf
c1a621f
ec423bf
c1a621f
ec423bf
c1a621f
ec423bf
c1a621f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: apache-2.0
---

# Model Card for Model ID

## Model Details

This project fine-tunes Microsoft's Phi-2 language model using parameter-efficient fine-tuning (LoRA) on the Nemotron-Personas-India dataset. The model is loaded using 4-bit NF4 quantization through BitsAndBytes to reduce memory consumption while maintaining training and inference capability on limited hardware.

### Model Description


- **Developed by:** Sachin Singh
- **Model type:** Causal Language Model
- **Base model:** Phi-2
- **Language(s):** English
- **Quantization:** 4-bit NF4 (BitsAndBytes)
- **Fine-tuning method:** LoRA (PEFT)
- **Dataset:** NVIDIA Nemotron-Personas-India (`en_IN` split)

### Model Sources

- **Base Model:** microsoft/phi-2
- **Dataset:** nvidia/Nemotron-Personas-India


### Direct Use

This model is intended for:

- Persona-conditioned text generation
- Instruction-following experiments
- Low-memory LLM deployment research
- Quantization benchmarking
- LoRA fine-tuning demonstrations
- LLM performance analytics studies

### Downstream Use

The fine-tuned model can serve as a foundation for:

- Persona-based conversational agents
- Lightweight chatbot deployments
- LLM optimization research
- Quantization and efficiency studies

### Out-of-Scope Use

This model is not intended for:

- Medical advice
- Legal advice
- Financial decision making
- Safety-critical systems
- High-risk automated decision systems

## Bias, Risks, and Limitations

The model inherits limitations from:

- The Phi-2 base model
- The Nemotron-Personas-India dataset
- Quantization-induced approximation errors
- Limited fine-tuning duration

Generated responses may contain inaccuracies, hallucinations, biases, or incomplete information.


## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

model_id = "microsoft/phi-2"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)
```

## Training Details

### Training Data

The model is fine-tuned using:

- Dataset: `nvidia/Nemotron-Personas-India`
- Split: `en_IN`
- Sample Size: 5,000 records

Persona records are transformed into instruction-response training examples before fine-tuning.


#### Training Hyperparameters

- Fine-tuning Method: LoRA
- Quantization: 4-bit NF4
- Epochs: 1
- Compute Type: FP16
- Double Quantization: Enabled


#### Summary

The project evaluates the trade-offs between model efficiency and generation capability when applying 4-bit quantization and LoRA fine-tuning to Phi-2.


### Model Architecture and Objective

- Architecture: Phi-2 Transformer
- Objective: Causal Language Modeling
- Adaptation Method: LoRA
- Quantization Method: BitsAndBytes NF4 4-bit Quantization

### Compute Infrastructure

GPU T4 x2


## Citation [optional]

```bibtex
@misc{phi2,
  title={Phi-2: The surprising power of small language models},
  author={Microsoft Research}
}
```

### Dataset

```bibtex
@misc{nemotron_personas_india,
  title={Nemotron Personas India Dataset},
  author={NVIDIA}
}
```

## Model Card Authors 

Sachin Singh

## Model in Notebook

[[More Information Needed]](https://www.kaggle.com/code/shreyasraghav/4-bit-quantization-with-phi-2-with-more-analytics)