sixfingerdev
/

SixFinger-8B

Text Generation

Model card Files Files and versions

SixFinger-8B / README.md

sixfingerdev's picture

Update README.md

48f3b0a verified about 2 months ago

|

history blame contribute delete

3.02 kB

	---
	base_model: unsloth/meta-llama-3.1-8b-bnb-4bit
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- base_model:adapter:unsloth/meta-llama-3.1-8b-bnb-4bit
	- lora
	- sft
	- transformers
	- trl
	- unsloth
	license: apache-2.0
	datasets:
	- sixfingerdev/turkish-qa-multi-dialog-dataset
	language:
	- tr
	- en
	- zh
	---
	# SixFinger-8B Adapter for LLaMA 3.1 8B

	This repository contains a LoRA adapter for the SixFinger-8B model.
	The adapter allows fine-tuned responses on top of the base model ```unsloth/llama-3.1-8b-bnb-4bit``` without modifying the base weights.

	---

	## Overview

	- Base Model: unsloth/llama-3.1-8b-bnb-4bit
	- Adapter Type: LoRA
	- Quantization: 4-bit (via bitsandbytes)
	- Purpose: Enhanced response generation for Turkish/English mixed datasets.
	- Compatibility: Use with Hugging Face Transformers + PEFT library.

	---

	## Installation

	Install required dependencies:

	```!pip install transformers accelerate bitsandbytes peft```

	Ensure you have a GPU with sufficient VRAM for 4-bit inference.

	---

	## Loading the Model

	1. Load the Base Model

	```from transformers import AutoTokenizer, AutoModelForCausalLM```

	```base_model = AutoModelForCausalLM.from_pretrained(
	"unsloth/llama-3.1-8b-bnb-4bit",
	device_map="auto"
	)
	```

	2. Load the Adapter

	'from peft import PeftModel'

	'model = PeftModel.from_pretrained('
	' base_model,'
	' "sixfingerdev/SixFinger-8B"'
	')'

	3. Load the Tokenizer

	'tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit")'

	---

	## Example Usage

	Generate text using the adapter:
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	# Base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"unsloth/llama-3.1-8b-bnb-4bit",
	device_map="auto"
	)

	# LoRA adapter
	model = PeftModel.from_pretrained(base_model, "sixfingerdev/SixFinger-8B")

	# Tokenizer
	tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit")

	# Örnek text generation
	prompt = "Soru: Yapay zeka nedir?\nCevap:"
	inputs = tokenizer(prompt, return_tensors="pt")
	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```
	---

	## Notes

	- The adapter does not modify the base model; it only applies LoRA weights on top.
	- 4-bit quantization significantly reduces VRAM usage. Ensure your GPU supports bitsandbytes 4-bit operations.
	- You can merge the adapter into the base model for easier deployment if needed.

	---

	## References

	- [PEFT (Parameter-Efficient Fine-Tuning)](https://huggingface.co/docs/peft/index)
	- [Transformers 4-bit Quantization](https://huggingface.co/docs/transformers/main/en/main_classes/quantization)

	---

	## License

	The adapter and its usage are provided under the terms specified in the repository.
	Ensure compliance with the base model license (Meta’s LLaMA).