---
base_model: vngrs-ai/Kumru-2B-Base
library_name: peft
license: apache-2.0
tags:
- lora
- causal-lm
- mistral
- turkish
- kumru
datasets:
- vngrs-ai/vngrs-web-corpus
language:
- tr
pipeline_tag: text-generation
---

  # Kumru-2B LoRA Adapter

  This repository provides a **LoRA** adapter distilled from the **VNGRS Kumru-2B** model (
  `vngrs-ai/Kumru-2B`, the SFT/chat variant) to be applied on top of the base model 
  `vngrs-ai/Kumru-2B-Base`. The goal is to transfer Kumru’s chat/instruction behavior 
  to `Kumru-2B-Base` deployments with a lightweight file footprint.

  ## Model Summary

  - **Base model:** `vngrs-ai/Kumru-2B-Base`
  - **Source (target behavior) model:** `vngrs-ai/Kumru-2B` (SFT/chat)
  - **echnique:** Low-Rank Adaptation (LoRA)
  - **LoRA rank / alpha:** 768 / 1024 _(update these if you produce a different build)_
  - **Layer coverage:** All self-attention and MLP projections
  - **Output artifacts:** PEFT-compatible `adapter_config.json` + `adapter_model.safetensors`
  - **License:** Apache 2.0 (aligned with VNGRS Kumru model licensing)

  ## Usage

  ```python
  from peft import PeftModel
  from transformers import AutoModelForCausalLM, AutoTokenizer

  base = "vngrs-ai/Kumru-2B-Base"
  adapter = "ceofast/kumru-2b-lora"

  tokenizer = AutoTokenizer.from_pretrained(base)
  model = AutoModelForCausalLM.from_pretrained(base, torch_dtype="auto", device_map="auto")
  model = PeftModel.from_pretrained(model, adapter, device_map="auto")

  messages = [
      {"role": "system", "content": "Adın Kumru, Türkçe konuşan yardımcı bir modelsin."},
      {"role": "user", "content": "İstanbul'un fethi hakkında kısa bilgi verir misin?"}
  ]
  inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
  outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
  print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
  ```
  > Not: This adapter must be used together with `vngrs-ai/Kumru-2B-Base`.

  ## Extraction Process

  The adapter is obtained by computing the delta between the base and the SFT checkpoints and factorizing it with **SVD** 
  into low-rank components. In this release, the measured reconstruction error is approximately **0.409**. To better preserve 
  quality, you may increase rank/alpha and export a new version (e.g., rank **1024** / alpha **2048**). A lower-error build will 
  be added as soon as possible.

  - Script: export_kumru.py

  ## Known Limitations

  - Kumru-2B is still a ~2B-parameter model; it may struggle with very long context, rare technical terms, and complex math.
  - With low ranks, SVD-based LoRA can be less stable than the original SFT checkpoint.
  - Training data is based on VNGRS’s public Turkish corpus cleaning pipeline; truthfulness/hallucination issues may still occur.
    
### Framework versions

- PEFT 0.11.1