kumru-2b-lora / README.md
ceofast's picture
Update README.md
c2fc773 verified
---
base_model: vngrs-ai/Kumru-2B-Base
library_name: peft
license: apache-2.0
tags:
- lora
- causal-lm
- mistral
- turkish
- kumru
datasets:
- vngrs-ai/vngrs-web-corpus
language:
- tr
pipeline_tag: text-generation
---
# Kumru-2B LoRA Adapter
This repository provides a **LoRA** adapter distilled from the **VNGRS Kumru-2B** model (
`vngrs-ai/Kumru-2B`, the SFT/chat variant) to be applied on top of the base model
`vngrs-ai/Kumru-2B-Base`. The goal is to transfer Kumru’s chat/instruction behavior
to `Kumru-2B-Base` deployments with a lightweight file footprint.
## Model Summary
- **Base model:** `vngrs-ai/Kumru-2B-Base`
- **Source (target behavior) model:** `vngrs-ai/Kumru-2B` (SFT/chat)
- **echnique:** Low-Rank Adaptation (LoRA)
- **LoRA rank / alpha:** 768 / 1024 _(update these if you produce a different build)_
- **Layer coverage:** All self-attention and MLP projections
- **Output artifacts:** PEFT-compatible `adapter_config.json` + `adapter_model.safetensors`
- **License:** Apache 2.0 (aligned with VNGRS Kumru model licensing)
## Usage
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "vngrs-ai/Kumru-2B-Base"
adapter = "ceofast/kumru-2b-lora"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
model = PeftModel.from_pretrained(model, adapter, device_map="auto")
messages = [
{"role": "system", "content": "Adın Kumru, Türkçe konuşan yardımcı bir modelsin."},
{"role": "user", "content": "İstanbul'un fethi hakkında kısa bilgi verir misin?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
```
> Not: This adapter must be used together with `vngrs-ai/Kumru-2B-Base`.
## Extraction Process
The adapter is obtained by computing the delta between the base and the SFT checkpoints and factorizing it with **SVD**
into low-rank components. In this release, the measured reconstruction error is approximately **0.409**. To better preserve
quality, you may increase rank/alpha and export a new version (e.g., rank **1024** / alpha **2048**). A lower-error build will
be added as soon as possible.
- Script: export_kumru.py
## Known Limitations
- Kumru-2B is still a ~2B-parameter model; it may struggle with very long context, rare technical terms, and complex math.
- With low ranks, SVD-based LoRA can be less stable than the original SFT checkpoint.
- Training data is based on VNGRS’s public Turkish corpus cleaning pipeline; truthfulness/hallucination issues may still occur.
### Framework versions
- PEFT 0.11.1