|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- tr |
|
|
- en |
|
|
library_name: transformers |
|
|
tags: |
|
|
- kubernetes |
|
|
- devops |
|
|
- quantized |
|
|
- 4bit |
|
|
- gemma3 |
|
|
- bitsandbytes |
|
|
base_model: aciklab/kubernetes-ai |
|
|
model_type: gemma3 |
|
|
quantized_by: aciklab |
|
|
--- |
|
|
|
|
|
# Kubernetes AI - 4bit Safetensors |
|
|
|
|
|
Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to 4bit format for efficient inference with reduced memory footprint. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This repository contains a 4bit quantized version of the Kubernetes AI model, optimized for running on consumer hardware with reduced VRAM/RAM requirements. The model uses BitsAndBytes quantization with safetensors format for fast loading and efficient inference. |
|
|
|
|
|
**Primary Purpose:** Answer Kubernetes-related questions in Turkish language with minimal hardware requirements. |
|
|
|
|
|
## Model Specifications |
|
|
|
|
|
| Specification | Details | |
|
|
|---------------|---------| |
|
|
| **Format** | Safetensors (4bit quantized) | |
|
|
| **Base Model** | unsloth/gemma-3-12b-it-qat-bnb-4bit | |
|
|
| **Quantization** | 4bit (BitsAndBytes) | |
|
|
| **Model Size** | ~7.2 GB | |
|
|
| **Memory Usage** | ~8-10 GB VRAM/RAM | |
|
|
| **Precision** | 4bit weights, FP16 compute | |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
# Install required packages |
|
|
pip install torch transformers accelerate bitsandbytes safetensors |
|
|
``` |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "aciklab/kubernetes-ai-4bit" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
device_map="auto", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# Prepare input |
|
|
prompt = "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?" |
|
|
|
|
|
# Format with chat template |
|
|
messages = [ |
|
|
{"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."}, |
|
|
{"role": "user", "content": prompt} |
|
|
] |
|
|
|
|
|
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
|
|
|
|
|
# Generate response |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=512, |
|
|
temperature=1.0, |
|
|
top_p=0.95, |
|
|
top_k=64, |
|
|
repetition_penalty=1.05, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters: |
|
|
|
|
|
- **Base Model:** unsloth/gemma-3-12b-it-qat-bnb-4bit |
|
|
- **Training Method:** LoRA (Low-Rank Adaptation) |
|
|
- **LoRA Rank:** 8 |
|
|
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
|
- **Training Dataset:** ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets |
|
|
- **Training Time:** 28 hours on NVIDIA RTX 5070 12GB |
|
|
- **Max Sequence Length:** 1024 tokens |
|
|
|
|
|
### Training Dataset Summary |
|
|
|
|
|
| Dataset Category | Count | Description | |
|
|
|-----------------|-------|-------------| |
|
|
| **Kubernetes Official Docs** | 8,910 | Concepts, kubectl, setup, tasks, tutorials | |
|
|
| **Stack Overflow** | 52,000 | Kubernetes Q&A from community | |
|
|
| **DevOps Datasets** | 62,500 | General DevOps and Kubernetes content | |
|
|
| **Configurations & CLI** | 36,800 | Kubernetes configs, kubectl examples, operators | |
|
|
| **Total** | **~157,210** | Comprehensive Kubernetes knowledge base | |
|
|
|
|
|
## Quantization Details |
|
|
|
|
|
This model uses 4bit quantization with BitsAndBytes for optimal memory efficiency: |
|
|
|
|
|
- **Source:** Merged LoRA adapters with base model |
|
|
- **Quantization Method:** BitsAndBytes 4bit (NF4) |
|
|
- **Compute Precision:** FP16 |
|
|
- **Format:** Safetensors (fast loading) |
|
|
- **Memory Footprint:** ~7.2 GB on disk, ~8-10 GB in memory |
|
|
|
|
|
### Advantages of 4bit Format |
|
|
|
|
|
- **Efficient Memory Usage:** Runs on GPUs with 8GB+ VRAM |
|
|
- **Fast Loading:** Safetensors format loads quickly |
|
|
- **Good Quality:** Minimal accuracy loss compared to full precision |
|
|
- **Framework Support:** Compatible with Transformers, vLLM, Text Generation Inference |
|
|
- **Flexible Deployment:** Can run on CPU with acceptable speed |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
### Minimum (GPU) |
|
|
- **GPU:** 8GB VRAM |
|
|
- **RAM:** 8GB system memory |
|
|
- **Storage:** 10GB free space |
|
|
|
|
|
### Recommended |
|
|
- **GPU:** 12GB+ VRAM |
|
|
- **RAM:** 16GB system memory |
|
|
- **Storage:** 15GB free space |
|
|
|
|
|
|
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Language:** Optimized primarily for Turkish and English. |
|
|
- **Domain:** Specialized for Kubernetes; may not perform well on general topics |
|
|
- **Quantization:** 4bit quantization may occasionally affect response quality on complex queries |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the **MIT License**. Free to use in commercial and open-source projects. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research or applications, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{kubernetes-ai-4bit, |
|
|
author = {HAVELSAN/Açıklab}, |
|
|
title = {Kubernetes AI - 4bit Safetensors}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
howpublished = {\url{https://huggingface.co/aciklab/kubernetes-ai-4bit}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
|
|
|
**Produced by:** HAVELSAN/Açıklab |
|
|
|
|
|
For questions, feedback, or issues, please open an issue on the model repository or contact us through HuggingFace. |
|
|
|
|
|
## Related Models |
|
|
|
|
|
- [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) - Original LoRA adapters |
|
|
- [aciklab/kubernetes-ai-GGUF](https://huggingface.co/aciklab/kubernetes-ai-GGUF) - GGUF quantized versions for llama.cpp |
|
|
|
|
|
--- |
|
|
|
|
|
**Note:** This is a 4bit quantized model ready for immediate use with the Transformers library. No additional model merging or quantization required. |
|
|
|