sachinssh
/

4-bit-quantization-with-phi-2

Model card Files Files and versions

4-bit-quantization-with-phi-2 / README.md

sachinssh's picture

Update README.md

c1a621f verified 26 days ago

|

History Blame Contribute Delete

3.52 kB

	---
	license: apache-2.0
	---

	# Model Card for Model ID

	## Model Details

	This project fine-tunes Microsoft's Phi-2 language model using parameter-efficient fine-tuning (LoRA) on the Nemotron-Personas-India dataset. The model is loaded using 4-bit NF4 quantization through BitsAndBytes to reduce memory consumption while maintaining training and inference capability on limited hardware.

	### Model Description


	- Developed by: Sachin Singh
	- Model type: Causal Language Model
	- Base model: Phi-2
	- Language(s): English
	- Quantization: 4-bit NF4 (BitsAndBytes)
	- Fine-tuning method: LoRA (PEFT)
	- Dataset: NVIDIA Nemotron-Personas-India (`en_IN` split)

	### Model Sources

	- Base Model: microsoft/phi-2
	- Dataset: nvidia/Nemotron-Personas-India


	### Direct Use

	This model is intended for:

	- Persona-conditioned text generation
	- Instruction-following experiments
	- Low-memory LLM deployment research
	- Quantization benchmarking
	- LoRA fine-tuning demonstrations
	- LLM performance analytics studies

	### Downstream Use

	The fine-tuned model can serve as a foundation for:

	- Persona-based conversational agents
	- Lightweight chatbot deployments
	- LLM optimization research
	- Quantization and efficiency studies

	### Out-of-Scope Use

	This model is not intended for:

	- Medical advice
	- Legal advice
	- Financial decision making
	- Safety-critical systems
	- High-risk automated decision systems

	## Bias, Risks, and Limitations

	The model inherits limitations from:

	- The Phi-2 base model
	- The Nemotron-Personas-India dataset
	- Quantization-induced approximation errors
	- Limited fine-tuning duration

	Generated responses may contain inaccuracies, hallucinations, biases, or incomplete information.


	## How to Get Started with the Model

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
	import torch

	model_id = "microsoft/phi-2"

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	bnb_4bit_use_double_quant=True
	)

	tokenizer = AutoTokenizer.from_pretrained(model_id)

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	quantization_config=bnb_config,
	device_map="auto"
	)
	```

	## Training Details

	### Training Data

	The model is fine-tuned using:

	- Dataset: `nvidia/Nemotron-Personas-India`
	- Split: `en_IN`
	- Sample Size: 5,000 records

	Persona records are transformed into instruction-response training examples before fine-tuning.


	#### Training Hyperparameters

	- Fine-tuning Method: LoRA
	- Quantization: 4-bit NF4
	- Epochs: 1
	- Compute Type: FP16
	- Double Quantization: Enabled


	#### Summary

	The project evaluates the trade-offs between model efficiency and generation capability when applying 4-bit quantization and LoRA fine-tuning to Phi-2.


	### Model Architecture and Objective

	- Architecture: Phi-2 Transformer
	- Objective: Causal Language Modeling
	- Adaptation Method: LoRA
	- Quantization Method: BitsAndBytes NF4 4-bit Quantization

	### Compute Infrastructure

	GPU T4 x2


	## Citation [optional]

	```bibtex
	@misc{phi2,
	title={Phi-2: The surprising power of small language models},
	author={Microsoft Research}
	}
	```

	### Dataset

	```bibtex
	@misc{nemotron_personas_india,
	title={Nemotron Personas India Dataset},
	author={NVIDIA}
	}
	```

	## Model Card Authors

	Sachin Singh

	## Model in Notebook

	[[More Information Needed]](https://www.kaggle.com/code/shreyasraghav/4-bit-quantization-with-phi-2-with-more-analytics)