kubernetes-ai-4bit / README.md
ikaganacar's picture
Update README.md
8335a67 verified
---
license: mit
language:
- tr
- en
library_name: transformers
tags:
- kubernetes
- devops
- quantized
- 4bit
- gemma3
- bitsandbytes
base_model: aciklab/kubernetes-ai
model_type: gemma3
quantized_by: aciklab
---
# Kubernetes AI - 4bit Safetensors
Fine-tuned Gemma 3 12B model specialized for answering Kubernetes questions in Turkish, quantized to 4bit format for efficient inference with reduced memory footprint.
## Model Description
This repository contains a 4bit quantized version of the Kubernetes AI model, optimized for running on consumer hardware with reduced VRAM/RAM requirements. The model uses BitsAndBytes quantization with safetensors format for fast loading and efficient inference.
**Primary Purpose:** Answer Kubernetes-related questions in Turkish language with minimal hardware requirements.
## Model Specifications
| Specification | Details |
|---------------|---------|
| **Format** | Safetensors (4bit quantized) |
| **Base Model** | unsloth/gemma-3-12b-it-qat-bnb-4bit |
| **Quantization** | 4bit (BitsAndBytes) |
| **Model Size** | ~7.2 GB |
| **Memory Usage** | ~8-10 GB VRAM/RAM |
| **Precision** | 4bit weights, FP16 compute |
## Quick Start
### Installation
```bash
# Install required packages
pip install torch transformers accelerate bitsandbytes safetensors
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "aciklab/kubernetes-ai-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
trust_remote_code=True
)
# Prepare input
prompt = "Kubernetes'te 3 replikaya sahip bir deployment nasıl oluştururum?"
# Format with chat template
messages = [
{"role": "system", "content": "Sen Kubernetes konusunda uzmanlaşmış bir yapay zeka asistanısın. Kubernetes ile ilgili soruları Türkçe olarak yanıtlıyorsun."},
{"role": "user", "content": prompt}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=1.0,
top_p=0.95,
top_k=64,
repetition_penalty=1.05,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Training Details
This model is based on the [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) LoRA adapters:
- **Base Model:** unsloth/gemma-3-12b-it-qat-bnb-4bit
- **Training Method:** LoRA (Low-Rank Adaptation)
- **LoRA Rank:** 8
- **Target Modules:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Training Dataset:** ~157,210 examples from Kubernetes docs, Stack Overflow, and DevOps datasets
- **Training Time:** 28 hours on NVIDIA RTX 5070 12GB
- **Max Sequence Length:** 1024 tokens
### Training Dataset Summary
| Dataset Category | Count | Description |
|-----------------|-------|-------------|
| **Kubernetes Official Docs** | 8,910 | Concepts, kubectl, setup, tasks, tutorials |
| **Stack Overflow** | 52,000 | Kubernetes Q&A from community |
| **DevOps Datasets** | 62,500 | General DevOps and Kubernetes content |
| **Configurations & CLI** | 36,800 | Kubernetes configs, kubectl examples, operators |
| **Total** | **~157,210** | Comprehensive Kubernetes knowledge base |
## Quantization Details
This model uses 4bit quantization with BitsAndBytes for optimal memory efficiency:
- **Source:** Merged LoRA adapters with base model
- **Quantization Method:** BitsAndBytes 4bit (NF4)
- **Compute Precision:** FP16
- **Format:** Safetensors (fast loading)
- **Memory Footprint:** ~7.2 GB on disk, ~8-10 GB in memory
### Advantages of 4bit Format
- **Efficient Memory Usage:** Runs on GPUs with 8GB+ VRAM
- **Fast Loading:** Safetensors format loads quickly
- **Good Quality:** Minimal accuracy loss compared to full precision
- **Framework Support:** Compatible with Transformers, vLLM, Text Generation Inference
- **Flexible Deployment:** Can run on CPU with acceptable speed
## Hardware Requirements
### Minimum (GPU)
- **GPU:** 8GB VRAM
- **RAM:** 8GB system memory
- **Storage:** 10GB free space
### Recommended
- **GPU:** 12GB+ VRAM
- **RAM:** 16GB system memory
- **Storage:** 15GB free space
## Limitations
- **Language:** Optimized primarily for Turkish and English.
- **Domain:** Specialized for Kubernetes; may not perform well on general topics
- **Quantization:** 4bit quantization may occasionally affect response quality on complex queries
## License
This model is released under the **MIT License**. Free to use in commercial and open-source projects.
## Citation
If you use this model in your research or applications, please cite:
```bibtex
@misc{kubernetes-ai-4bit,
author = {HAVELSAN/Açıklab},
title = {Kubernetes AI - 4bit Safetensors},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/aciklab/kubernetes-ai-4bit}}
}
```
## Contact
**Produced by:** HAVELSAN/Açıklab
For questions, feedback, or issues, please open an issue on the model repository or contact us through HuggingFace.
## Related Models
- [aciklab/kubernetes-ai](https://huggingface.co/aciklab/kubernetes-ai) - Original LoRA adapters
- [aciklab/kubernetes-ai-GGUF](https://huggingface.co/aciklab/kubernetes-ai-GGUF) - GGUF quantized versions for llama.cpp
---
**Note:** This is a 4bit quantized model ready for immediate use with the Transformers library. No additional model merging or quantization required.