Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -13,10 +13,10 @@ tags:
|
|
| 13 |
- low-resource
|
| 14 |
- nlp
|
| 15 |
datasets:
|
| 16 |
-
- ogulcanaydogan/
|
| 17 |
pipeline_tag: text-generation
|
| 18 |
model-index:
|
| 19 |
-
- name:
|
| 20 |
results: []
|
| 21 |
---
|
| 22 |
|
|
@@ -25,10 +25,10 @@ model-index:
|
|
| 25 |
An open-source 14.7 billion parameter language model fine-tuned for native Turkish instruction following. Built on Qwen2.5-14B-Instruct using supervised fine-tuning (SFT) on a curated corpus of Turkish-language examples spanning science, history, geography, and general knowledge.
|
| 26 |
|
| 27 |
<p align="center">
|
| 28 |
-
<a href="https://huggingface.co/spaces/ogulcanaydogan/
|
| 29 |
<a href="https://github.com/ogulcanaydogan/Turkish-LLM"><img src="https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github" alt="GitHub"></a>
|
| 30 |
-
<a href="https://huggingface.co/datasets/ogulcanaydogan/
|
| 31 |
-
<a href="https://huggingface.co/ogulcanaydogan/
|
| 32 |
</p>
|
| 33 |
|
| 34 |
---
|
|
@@ -65,14 +65,14 @@ This model is part of the **Turkish-LLM** family:
|
|
| 65 |
|
| 66 |
| Model | Parameters | Base | Method | Use Case |
|
| 67 |
|-------|-----------|------|--------|----------|
|
| 68 |
-
| **
|
| 69 |
-
| [
|
| 70 |
|
| 71 |
## Training
|
| 72 |
|
| 73 |
### Dataset
|
| 74 |
|
| 75 |
-
Training data was sourced from the [
|
| 76 |
|
| 77 |
| Domain | Examples | Purpose |
|
| 78 |
|--------|----------|---------|
|
|
@@ -120,7 +120,7 @@ Raw Turkish Data ──▶ Preprocessing ──▶ SFT Training ──▶ Evalua
|
|
| 120 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 121 |
import torch
|
| 122 |
|
| 123 |
-
model_id = "ogulcanaydogan/
|
| 124 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 125 |
model = AutoModelForCausalLM.from_pretrained(
|
| 126 |
model_id,
|
|
@@ -150,7 +150,7 @@ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_t
|
|
| 150 |
|
| 151 |
```bash
|
| 152 |
pip install vllm
|
| 153 |
-
vllm serve ogulcanaydogan/
|
| 154 |
--dtype float16 \
|
| 155 |
--max-model-len 4096
|
| 156 |
```
|
|
@@ -158,7 +158,7 @@ vllm serve ogulcanaydogan/turkish-llm-14b-instruct \
|
|
| 158 |
### Ollama (Local)
|
| 159 |
|
| 160 |
```bash
|
| 161 |
-
ollama run hf.co/ogulcanaydogan/
|
| 162 |
```
|
| 163 |
|
| 164 |
### Chat Template
|
|
@@ -218,10 +218,10 @@ This model is released under Apache 2.0 to support open research and development
|
|
| 218 |
|
| 219 |
| Resource | Link |
|
| 220 |
|----------|------|
|
| 221 |
-
| 7B Model | [
|
| 222 |
-
| Training Dataset (144K) | [
|
| 223 |
-
| Live Demo (14B) | [
|
| 224 |
-
| Live Demo (7B) | [
|
| 225 |
| Training Pipeline | [LowResource-LLM-Forge](https://github.com/ogulcanaydogan/LowResource-LLM-Forge) |
|
| 226 |
| Project Repository | [Turkish-LLM on GitHub](https://github.com/ogulcanaydogan/Turkish-LLM) |
|
| 227 |
|
|
@@ -233,7 +233,7 @@ This model is released under Apache 2.0 to support open research and development
|
|
| 233 |
author = {Aydogan, Ogulcan},
|
| 234 |
year = {2026},
|
| 235 |
publisher = {Hugging Face},
|
| 236 |
-
url = {https://huggingface.co/ogulcanaydogan/
|
| 237 |
}
|
| 238 |
```
|
| 239 |
|
|
|
|
| 13 |
- low-resource
|
| 14 |
- nlp
|
| 15 |
datasets:
|
| 16 |
+
- ogulcanaydogan/Turkish-LLM-v10-Training
|
| 17 |
pipeline_tag: text-generation
|
| 18 |
model-index:
|
| 19 |
+
- name: Turkish-LLM-14B-Instruct
|
| 20 |
results: []
|
| 21 |
---
|
| 22 |
|
|
|
|
| 25 |
An open-source 14.7 billion parameter language model fine-tuned for native Turkish instruction following. Built on Qwen2.5-14B-Instruct using supervised fine-tuning (SFT) on a curated corpus of Turkish-language examples spanning science, history, geography, and general knowledge.
|
| 26 |
|
| 27 |
<p align="center">
|
| 28 |
+
<a href="https://huggingface.co/spaces/ogulcanaydogan/Turkish-LLM-14B-Chat"><img src="https://img.shields.io/badge/Demo-Live_Chat-blue?style=for-the-badge&logo=huggingface" alt="Demo"></a>
|
| 29 |
<a href="https://github.com/ogulcanaydogan/Turkish-LLM"><img src="https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github" alt="GitHub"></a>
|
| 30 |
+
<a href="https://huggingface.co/datasets/ogulcanaydogan/Turkish-LLM-v10-Training"><img src="https://img.shields.io/badge/Dataset-144K_samples-green?style=for-the-badge&logo=huggingface" alt="Dataset"></a>
|
| 31 |
+
<a href="https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct"><img src="https://img.shields.io/badge/Also_Available-7B_Model-yellow?style=for-the-badge&logo=huggingface" alt="7B"></a>
|
| 32 |
</p>
|
| 33 |
|
| 34 |
---
|
|
|
|
| 65 |
|
| 66 |
| Model | Parameters | Base | Method | Use Case |
|
| 67 |
|-------|-----------|------|--------|----------|
|
| 68 |
+
| **Turkish-LLM-14B-Instruct** (this) | 14.7B | Qwen2.5-14B-Instruct | SFT | Higher quality, complex reasoning |
|
| 69 |
+
| [Turkish-LLM-7B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct) | 7B | Turkcell-LLM-7b-v1 | LoRA | Lightweight, faster inference |
|
| 70 |
|
| 71 |
## Training
|
| 72 |
|
| 73 |
### Dataset
|
| 74 |
|
| 75 |
+
Training data was sourced from the [Turkish-LLM-v10-Training](https://huggingface.co/datasets/ogulcanaydogan/Turkish-LLM-v10-Training) dataset — a curated collection of **144,000 Turkish instruction-response pairs** — with a focused SFT subset of approximately 2,600 high-quality examples selected for alignment.
|
| 76 |
|
| 77 |
| Domain | Examples | Purpose |
|
| 78 |
|--------|----------|---------|
|
|
|
|
| 120 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 121 |
import torch
|
| 122 |
|
| 123 |
+
model_id = "ogulcanaydogan/Turkish-LLM-14B-Instruct"
|
| 124 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 125 |
model = AutoModelForCausalLM.from_pretrained(
|
| 126 |
model_id,
|
|
|
|
| 150 |
|
| 151 |
```bash
|
| 152 |
pip install vllm
|
| 153 |
+
vllm serve ogulcanaydogan/Turkish-LLM-14B-Instruct \
|
| 154 |
--dtype float16 \
|
| 155 |
--max-model-len 4096
|
| 156 |
```
|
|
|
|
| 158 |
### Ollama (Local)
|
| 159 |
|
| 160 |
```bash
|
| 161 |
+
ollama run hf.co/ogulcanaydogan/Turkish-LLM-14B-Instruct
|
| 162 |
```
|
| 163 |
|
| 164 |
### Chat Template
|
|
|
|
| 218 |
|
| 219 |
| Resource | Link |
|
| 220 |
|----------|------|
|
| 221 |
+
| 7B Model | [Turkish-LLM-7B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct) |
|
| 222 |
+
| Training Dataset (144K) | [Turkish-LLM-v10-Training](https://huggingface.co/datasets/ogulcanaydogan/Turkish-LLM-v10-Training) |
|
| 223 |
+
| Live Demo (14B) | [Turkish-LLM-14B-Chat](https://huggingface.co/spaces/ogulcanaydogan/Turkish-LLM-14B-Chat) |
|
| 224 |
+
| Live Demo (7B) | [Turkish-LLM-7B-Chat](https://huggingface.co/spaces/ogulcanaydogan/Turkish-LLM-7B-Chat) |
|
| 225 |
| Training Pipeline | [LowResource-LLM-Forge](https://github.com/ogulcanaydogan/LowResource-LLM-Forge) |
|
| 226 |
| Project Repository | [Turkish-LLM on GitHub](https://github.com/ogulcanaydogan/Turkish-LLM) |
|
| 227 |
|
|
|
|
| 233 |
author = {Aydogan, Ogulcan},
|
| 234 |
year = {2026},
|
| 235 |
publisher = {Hugging Face},
|
| 236 |
+
url = {https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct}
|
| 237 |
}
|
| 238 |
```
|
| 239 |
|