SixFinger-8B / README.md
sixfingerdev's picture
Update README.md
48f3b0a verified
---
base_model: unsloth/meta-llama-3.1-8b-bnb-4bit
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:unsloth/meta-llama-3.1-8b-bnb-4bit
- lora
- sft
- transformers
- trl
- unsloth
license: apache-2.0
datasets:
- sixfingerdev/turkish-qa-multi-dialog-dataset
language:
- tr
- en
- zh
---
# SixFinger-8B Adapter for LLaMA 3.1 8B
This repository contains a **LoRA adapter** for the SixFinger-8B model.
The adapter allows fine-tuned responses on top of the base model **```unsloth/llama-3.1-8b-bnb-4bit```** without modifying the base weights.
---
## Overview
- **Base Model:** unsloth/llama-3.1-8b-bnb-4bit
- **Adapter Type:** LoRA
- **Quantization:** 4-bit (via bitsandbytes)
- **Purpose:** Enhanced response generation for Turkish/English mixed datasets.
- **Compatibility:** Use with Hugging Face Transformers + PEFT library.
---
## Installation
Install required dependencies:
```!pip install transformers accelerate bitsandbytes peft```
Ensure you have a GPU with sufficient VRAM for 4-bit inference.
---
## Loading the Model
1. **Load the Base Model**
```from transformers import AutoTokenizer, AutoModelForCausalLM```
```base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/llama-3.1-8b-bnb-4bit",
device_map="auto"
)
```
2. **Load the Adapter**
'from peft import PeftModel'
'model = PeftModel.from_pretrained('
' base_model,'
' "sixfingerdev/SixFinger-8B"'
')'
3. **Load the Tokenizer**
'tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit")'
---
## Example Usage
Generate text using the adapter:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Base model
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/llama-3.1-8b-bnb-4bit",
device_map="auto"
)
# LoRA adapter
model = PeftModel.from_pretrained(base_model, "sixfingerdev/SixFinger-8B")
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit")
# Örnek text generation
prompt = "Soru: Yapay zeka nedir?\nCevap:"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## Notes
- The adapter does **not** modify the base model; it only applies LoRA weights on top.
- 4-bit quantization significantly reduces VRAM usage. Ensure your GPU supports **bitsandbytes 4-bit operations**.
- You can merge the adapter into the base model for easier deployment if needed.
---
## References
- [PEFT (Parameter-Efficient Fine-Tuning)](https://huggingface.co/docs/peft/index)
- [Transformers 4-bit Quantization](https://huggingface.co/docs/transformers/main/en/main_classes/quantization)
---
## License
The adapter and its usage are provided under the terms specified in the repository.
Ensure compliance with the **base model license** (Meta’s LLaMA).