Instructions to use kth8/gemma-3-1b-it-Conversation with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kth8/gemma-3-1b-it-Conversation with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kth8/gemma-3-1b-it-Conversation")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kth8/gemma-3-1b-it-Conversation")
model = AutoModelForCausalLM.from_pretrained("kth8/gemma-3-1b-it-Conversation")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use kth8/gemma-3-1b-it-Conversation with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kth8/gemma-3-1b-it-Conversation"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kth8/gemma-3-1b-it-Conversation",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kth8/gemma-3-1b-it-Conversation

SGLang

How to use kth8/gemma-3-1b-it-Conversation with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kth8/gemma-3-1b-it-Conversation" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kth8/gemma-3-1b-it-Conversation",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kth8/gemma-3-1b-it-Conversation" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kth8/gemma-3-1b-it-Conversation",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use kth8/gemma-3-1b-it-Conversation with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kth8/gemma-3-1b-it-Conversation to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kth8/gemma-3-1b-it-Conversation to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for kth8/gemma-3-1b-it-Conversation to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="kth8/gemma-3-1b-it-Conversation",
    max_seq_length=2048,
)

Docker Model Runner
How to use kth8/gemma-3-1b-it-Conversation with Docker Model Runner:
```
docker model run hf.co/kth8/gemma-3-1b-it-Conversation
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

A fine-tune of unsloth/gemma-3-1b-it on the kth8/multi-turn-conversation-50000x dataset.

Usage example

System prompt

You are a helpful assistant.

User prompt

Hey there! How's it going?

Assistant response

Hey! I'm doing great, thanks for asking! I'm here and ready to help with whatever you need. What's on your mind today?

Model Details

Base Model: unsloth/gemma-3-1b-it
Parameter Count: 999885952
Precision: torch.bfloat16

Training Settings

Hardware

GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition

PEFT

Rank: 32
LoRA alpha: 64
Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Gradient checkpointing: unsloth

SFT

Epoch: 2
Batch size: 48
Gradient Accumulation steps: 1
Warmup ratio: 0.1
Learning rate: 0.0002
Optimizer: adamw_torch_fused
Learning rate scheduler: cosine

Training stats

Global step: 1996
Training runtime (seconds): 6834.1445
Average training loss: 1.1743444665400442
Final validation loss: 1.1191450357437134

Framework versions

Unsloth: 2026.3.8
TRL: 0.22.2
Transformers: 4.56.2
Pytorch: 2.10.0+cu128
Datasets: 4.8.3
Tokenizers: 0.22.2

License

This model is released under the Gemma license. See the Gemma Terms of Use and Prohibited Use Policy regarding the use of Gemma-generated content.

Downloads last month: 6

Safetensors

Model size

1.0B params

Tensor type

BF16

Model tree for kth8/gemma-3-1b-it-Conversation

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it

Finetuned

unsloth/gemma-3-1b-it

Finetuned

(502)

this model

Quantizations

1 model

kth8
/

gemma-3-1b-it-Conversation

Usage example

Model Details

Training Settings

Hardware

PEFT

SFT

Training stats

Framework versions

License

Model tree for kth8/gemma-3-1b-it-Conversation

Dataset used to train kth8/gemma-3-1b-it-Conversation