|
|
--- |
|
|
base_model: unsloth/meta-llama-3.1-8b-bnb-4bit |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- base_model:adapter:unsloth/meta-llama-3.1-8b-bnb-4bit |
|
|
- lora |
|
|
- sft |
|
|
- transformers |
|
|
- trl |
|
|
- unsloth |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- sixfingerdev/turkish-qa-multi-dialog-dataset |
|
|
language: |
|
|
- tr |
|
|
- en |
|
|
- zh |
|
|
--- |
|
|
# SixFinger-8B Adapter for LLaMA 3.1 8B |
|
|
|
|
|
This repository contains a **LoRA adapter** for the SixFinger-8B model. |
|
|
The adapter allows fine-tuned responses on top of the base model **```unsloth/llama-3.1-8b-bnb-4bit```** without modifying the base weights. |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
- **Base Model:** unsloth/llama-3.1-8b-bnb-4bit |
|
|
- **Adapter Type:** LoRA |
|
|
- **Quantization:** 4-bit (via bitsandbytes) |
|
|
- **Purpose:** Enhanced response generation for Turkish/English mixed datasets. |
|
|
- **Compatibility:** Use with Hugging Face Transformers + PEFT library. |
|
|
|
|
|
--- |
|
|
|
|
|
## Installation |
|
|
|
|
|
Install required dependencies: |
|
|
|
|
|
```!pip install transformers accelerate bitsandbytes peft``` |
|
|
|
|
|
Ensure you have a GPU with sufficient VRAM for 4-bit inference. |
|
|
|
|
|
--- |
|
|
|
|
|
## Loading the Model |
|
|
|
|
|
1. **Load the Base Model** |
|
|
|
|
|
```from transformers import AutoTokenizer, AutoModelForCausalLM``` |
|
|
|
|
|
```base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"unsloth/llama-3.1-8b-bnb-4bit", |
|
|
device_map="auto" |
|
|
) |
|
|
``` |
|
|
|
|
|
2. **Load the Adapter** |
|
|
|
|
|
'from peft import PeftModel' |
|
|
|
|
|
'model = PeftModel.from_pretrained(' |
|
|
' base_model,' |
|
|
' "sixfingerdev/SixFinger-8B"' |
|
|
')' |
|
|
|
|
|
3. **Load the Tokenizer** |
|
|
|
|
|
'tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit")' |
|
|
|
|
|
--- |
|
|
|
|
|
## Example Usage |
|
|
|
|
|
Generate text using the adapter: |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
|
|
|
# Base model |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"unsloth/llama-3.1-8b-bnb-4bit", |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# LoRA adapter |
|
|
model = PeftModel.from_pretrained(base_model, "sixfingerdev/SixFinger-8B") |
|
|
|
|
|
# Tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3.1-8b-bnb-4bit") |
|
|
|
|
|
# Örnek text generation |
|
|
prompt = "Soru: Yapay zeka nedir?\nCevap:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
with torch.no_grad(): |
|
|
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
--- |
|
|
|
|
|
## Notes |
|
|
|
|
|
- The adapter does **not** modify the base model; it only applies LoRA weights on top. |
|
|
- 4-bit quantization significantly reduces VRAM usage. Ensure your GPU supports **bitsandbytes 4-bit operations**. |
|
|
- You can merge the adapter into the base model for easier deployment if needed. |
|
|
|
|
|
--- |
|
|
|
|
|
## References |
|
|
|
|
|
- [PEFT (Parameter-Efficient Fine-Tuning)](https://huggingface.co/docs/peft/index) |
|
|
- [Transformers 4-bit Quantization](https://huggingface.co/docs/transformers/main/en/main_classes/quantization) |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
The adapter and its usage are provided under the terms specified in the repository. |
|
|
Ensure compliance with the **base model license** (Meta’s LLaMA). |