Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +15 -14

README.md CHANGED Viewed

@@ -12,8 +12,6 @@ tags:
 - instruction-tuned
 - low-resource
 - nlp
-datasets:
-- ogulcanaydogan/Turkish-LLM-v10-Training
 pipeline_tag: text-generation
 model-index:
 - name: Turkish-LLM-14B-Instruct
@@ -26,8 +24,8 @@ An open-source 14.7 billion parameter language model fine-tuned for native Turki
 <p align="center">
   <a href="https://huggingface.co/spaces/ogulcanaydogan/Turkish-LLM-14B-Chat"><img src="https://img.shields.io/badge/Demo-Live_Chat-blue?style=for-the-badge&logo=huggingface" alt="Demo"></a>
   <a href="https://github.com/ogulcanaydogan/Turkish-LLM"><img src="https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github" alt="GitHub"></a>
-  <a href="https://huggingface.co/datasets/ogulcanaydogan/Turkish-LLM-v10-Training"><img src="https://img.shields.io/badge/Dataset-144K_samples-green?style=for-the-badge&logo=huggingface" alt="Dataset"></a>
   <a href="https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct"><img src="https://img.shields.io/badge/Also_Available-7B_Model-yellow?style=for-the-badge&logo=huggingface" alt="7B"></a>
 </p>
@@ -61,18 +59,17 @@ This model was developed to provide a **high-quality, open-source Turkish langua
 ### Model Family
-This model is part of the **Turkish-LLM** family:
 | Model | Parameters | Base | Method | Use Case |
 |-------|-----------|------|--------|----------|
 | **Turkish-LLM-14B-Instruct** (this) | 14.7B | Qwen2.5-14B-Instruct | SFT | Higher quality, complex reasoning |
 | [Turkish-LLM-7B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct) | 7B | Turkcell-LLM-7b-v1 | LoRA | Lightweight, faster inference |
 ## Training
 ### Dataset
-Training data was sourced from the [Turkish-LLM-v10-Training](https://huggingface.co/datasets/ogulcanaydogan/Turkish-LLM-v10-Training) dataset — a curated collection of **144,000 Turkish instruction-response pairs** — with a focused SFT subset of approximately 2,600 high-quality examples selected for alignment.
 | Domain | Examples | Purpose |
 |--------|----------|---------|
@@ -95,13 +92,13 @@ Training data was sourced from the [Turkish-LLM-v10-Training](https://huggingfac
 ### Training Pipeline
-Training was orchestrated using [LowResource-LLM-Forge](https://github.com/ogulcanaydogan/LowResource-LLM-Forge), a custom pipeline built for efficient fine-tuning of LLMs for low-resource languages. The pipeline handles data preprocessing, tokenization, training, and evaluation in a single workflow.
 ```
-Raw Turkish Data ──▶ Preprocessing ──▶ SFT Training ──▶ Evaluation ──▶ Deployment
-  (144K pairs)        (filtering,       (A100 80GB,      (manual +       (HF Hub,
-                       dedup,            bf16 mixed       qualitative)     Spaces,
-                       formatting)       precision)                        vLLM)
 ```
 ### Design Decisions
@@ -158,9 +155,13 @@ vllm serve ogulcanaydogan/Turkish-LLM-14B-Instruct \
 ### Ollama (Local)
 ```bash
-ollama run hf.co/ogulcanaydogan/Turkish-LLM-14B-Instruct
 ```
 ### Chat Template
 This model uses the ChatML format:
@@ -182,7 +183,7 @@ Sen yardimci bir Turkce yapay zeka asistanisin.<|im_end|>
 | INT8 | ~15 GB | RTX 4090, A10G |
 | INT4 (GPTQ/AWQ) | ~8 GB | RTX 3090, RTX 4080, Apple M-series (24GB) |
-For consumer hardware, INT4 quantized versions provide the best balance of quality and accessibility.
 ## Intended Use
@@ -218,8 +219,8 @@ This model is released under Apache 2.0 to support open research and development
 | Resource | Link |
 |----------|------|
 | 7B Model | [Turkish-LLM-7B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct) |
-| Training Dataset (144K) | [Turkish-LLM-v10-Training](https://huggingface.co/datasets/ogulcanaydogan/Turkish-LLM-v10-Training) |
 | Live Demo (14B) | [Turkish-LLM-14B-Chat](https://huggingface.co/spaces/ogulcanaydogan/Turkish-LLM-14B-Chat) |
 | Live Demo (7B) | [Turkish-LLM-7B-Chat](https://huggingface.co/spaces/ogulcanaydogan/Turkish-LLM-7B-Chat) |
 | Training Pipeline | [LowResource-LLM-Forge](https://github.com/ogulcanaydogan/LowResource-LLM-Forge) |

 - instruction-tuned
 - low-resource
 - nlp
 pipeline_tag: text-generation
 model-index:
 - name: Turkish-LLM-14B-Instruct
 <p align="center">
   <a href="https://huggingface.co/spaces/ogulcanaydogan/Turkish-LLM-14B-Chat"><img src="https://img.shields.io/badge/Demo-Live_Chat-blue?style=for-the-badge&logo=huggingface" alt="Demo"></a>
+  <a href="https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct-GGUF"><img src="https://img.shields.io/badge/GGUF-Quantized_Versions-orange?style=for-the-badge&logo=huggingface" alt="GGUF"></a>
   <a href="https://github.com/ogulcanaydogan/Turkish-LLM"><img src="https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github" alt="GitHub"></a>
   <a href="https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct"><img src="https://img.shields.io/badge/Also_Available-7B_Model-yellow?style=for-the-badge&logo=huggingface" alt="7B"></a>
 </p>
 ### Model Family
 | Model | Parameters | Base | Method | Use Case |
 |-------|-----------|------|--------|----------|
 | **Turkish-LLM-14B-Instruct** (this) | 14.7B | Qwen2.5-14B-Instruct | SFT | Higher quality, complex reasoning |
+| [Turkish-LLM-14B-Instruct-GGUF](https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct-GGUF) | 14.7B | This model | GGUF quantized | Local/edge deployment |
 | [Turkish-LLM-7B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct) | 7B | Turkcell-LLM-7b-v1 | LoRA | Lightweight, faster inference |
 ## Training
 ### Dataset
+Training data consists of a curated collection of **144,000 Turkish instruction-response pairs**, with a focused SFT subset of approximately 2,600 high-quality examples selected for alignment.
 | Domain | Examples | Purpose |
 |--------|----------|---------|
 ### Training Pipeline
+Training was orchestrated using [LowResource-LLM-Forge](https://github.com/ogulcanaydogan/LowResource-LLM-Forge), a custom pipeline built for efficient fine-tuning of LLMs for low-resource languages.
 ```
+Raw Turkish Data --> Preprocessing --> SFT Training --> Evaluation --> Deployment
+  (144K pairs)       (filtering,       (A100 80GB,      (manual +       (HF Hub,
+                      dedup,            bf16 mixed       qualitative)     Spaces,
+                      formatting)       precision)                        vLLM)
 ```
 ### Design Decisions
 ### Ollama (Local)
 ```bash
+ollama run hf.co/ogulcanaydogan/Turkish-LLM-14B-Instruct-GGUF:Q4_K_M
 ```
+### GGUF (llama.cpp / LM Studio)
+Quantized GGUF versions (Q4_K_M, Q5_K_M, Q8_0, F16) are available at [Turkish-LLM-14B-Instruct-GGUF](https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct-GGUF).
 ### Chat Template
 This model uses the ChatML format:
 | INT8 | ~15 GB | RTX 4090, A10G |
 | INT4 (GPTQ/AWQ) | ~8 GB | RTX 3090, RTX 4080, Apple M-series (24GB) |
+For consumer hardware, use the [GGUF versions](https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct-GGUF) for the best balance of quality and accessibility.
 ## Intended Use
 | Resource | Link |
 |----------|------|
+| GGUF Versions | [Turkish-LLM-14B-Instruct-GGUF](https://huggingface.co/ogulcanaydogan/Turkish-LLM-14B-Instruct-GGUF) |
 | 7B Model | [Turkish-LLM-7B-Instruct](https://huggingface.co/ogulcanaydogan/Turkish-LLM-7B-Instruct) |
 | Live Demo (14B) | [Turkish-LLM-14B-Chat](https://huggingface.co/spaces/ogulcanaydogan/Turkish-LLM-14B-Chat) |
 | Live Demo (7B) | [Turkish-LLM-7B-Chat](https://huggingface.co/spaces/ogulcanaydogan/Turkish-LLM-7B-Chat) |
 | Training Pipeline | [LowResource-LLM-Forge](https://github.com/ogulcanaydogan/LowResource-LLM-Forge) |