Instructions to use klusai/tf2-4b-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use klusai/tf2-4b-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="klusai/tf2-4b-gguf", filename="fp16/tf2-4b-f16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use klusai/tf2-4b-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf klusai/tf2-4b-gguf:F16 # Run inference directly in the terminal: llama-cli -hf klusai/tf2-4b-gguf:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf klusai/tf2-4b-gguf:F16 # Run inference directly in the terminal: llama-cli -hf klusai/tf2-4b-gguf:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf klusai/tf2-4b-gguf:F16 # Run inference directly in the terminal: ./llama-cli -hf klusai/tf2-4b-gguf:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf klusai/tf2-4b-gguf:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf klusai/tf2-4b-gguf:F16
Use Docker
docker model run hf.co/klusai/tf2-4b-gguf:F16
- LM Studio
- Jan
- Ollama
How to use klusai/tf2-4b-gguf with Ollama:
ollama run hf.co/klusai/tf2-4b-gguf:F16
- Unsloth Studio new
How to use klusai/tf2-4b-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for klusai/tf2-4b-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for klusai/tf2-4b-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for klusai/tf2-4b-gguf to start chatting
- Docker Model Runner
How to use klusai/tf2-4b-gguf with Docker Model Runner:
docker model run hf.co/klusai/tf2-4b-gguf:F16
- Lemonade
How to use klusai/tf2-4b-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull klusai/tf2-4b-gguf:F16
Run and chat with the model
lemonade run user.tf2-4b-gguf-F16
List all available models
lemonade list
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π± TinyFabulist-TF2-4B Β· Gemma 3 4-B ENβRO Fable Translator
tf2-4b is a parameter-efficiently fine-tuned checkpoint of Google Gemma 3 4 B that specialises in translating moral fables from English into Romanian.
π° Model Summary
| Field | Value |
|---|---|
| Base model | google/gemma-3-4b-it |
| Architecture | Decoder-only Transformer Β· 3.88 B params |
| Fine-tuning method | Supervised SFT β instruction tuning β LoRA (r = 16) Β· adapters merged |
| Training data | 12 000 ENβRO fable pairs (train) + 1 500 val / 1 500 test (TinyFabulist-TF2) |
| Objective | Next-token cross-entropy on Romanian targets |
| Hardware / budget | TODO (e.g. 2 Γ A100 80 GB Β· ~ h Β· β $) |
| Intended use | Offline literary translation of short stories / fables |
| Out-of-scope | News, legal, medical, or very long documents; languages other than EN β RO |
| Context window | 8 192 tokens |
β¨ How It Works
Give the model an English fable (β€ 2 000 tokens) and it returns a fluent Romanian version that preserves both narrative style and explicit moralβwithout relying on costly GPT-class APIs.
π Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_id = "klusai/tf2-4b"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
translator = pipeline("text-generation", model=model, tokenizer=tok)
en_fable = (
"Once upon a time, a small sparrow boasted to the mighty eagle that speed alone "
"was enough to conquer the sky. β¦ Moral: Pride often blinds us to our limits."
)
ro_fable = translator(
f"Translate the following fable into Romanian:\n\n{en_fable}",
max_new_tokens=512,
temperature=0.2
)[0]["generated_text"]
print(ro_fable)
π¦ Quantised Variants
| File | Precision | Size | Typical RAM |
|---|---|---|---|
tf2-4b-f16.safetensors |
FP16 | 7.77 GB | β₯ 16 GB GPU / 20 GB CPU |
tf2-4b-q5_k_m.gguf |
5-bit Q5_K_M | 2.83 GB | β₯ 6 GB RAM |
# Run the 5-bit build with llama-cpp-python
pip install llama-cpp-python
python -m llama_cpp.server \
--model tf2-4b-q5_k_m.gguf \
--n_ctx 8192
π§ Limitations & Biases
- Trained entirely on synthetic TinyFabulist narratives β may echo that phrasing.
- Domain-specific: excels at short moral stories; under-performs on highly technical or colloquial text.
- No integrated safety filtering β downstream applications should moderate outputs.
- Inputs longer than 8 192 tokens are truncated.
β Licence
Model: Apache 2.0 (commercial + research friendly)
Dataset: CC-BY-4.0 (TinyFabulist-TF2 ENβRO 15 k)
- Downloads last month
- 6
5-bit
16-bit