Update model card README with quantization details

30d1eb6 verified about 1 month ago

1.42 kB


	---
	license: apache-2.0
	tags:
	- qlora
	- finetuned
	- transformers
	datasets:
	- vishalgimhan/uber-report-2024-dataset
	---

	# Uber-assistant QLoRA Adapter

	This is a LoRA adapter finetuned on Uber Annual Report 2024

	## Base Model
	meta-llama/Llama-3.1-8B-Instruct

	## Dataset
	Finetuned using the [Uber Annual Report 2024 Dataset](https://huggingface.co/datasets/vishalgimhan/uber-report-2024-dataset)

	## Quantization & Training Hyperparameters
	- Quantization: 4-bit (NF4)
	- Compute Dtype: torch.bfloat16
	- Double Quantization: True
	- LoRA rank: 16
	- LoRA alpha: 32
	- Learning rate: 2e-5
	- Max steps: 100
	- Batch size (effective): 16
	- Max length: 512

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	from peft import PeftModel
	import torch

	model_id = "vishalgimhan/uber-assistant"
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_use_double_quant=True
	)

	base_model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Llama-3.1-8B-Instruct",
	quantization_config=bnb_config,
	device_map="auto"
	)
	model = PeftModel.from_pretrained(base_model, model_id)
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	```

	## License & Attribution

	This adapter inherits the license of the base model and dataset. Check those licenses before use or redistribution.