How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="SLT-AI/SLT-0.5b-GoToSpeak",
	filename="slt_q4km.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

SLT-0.5b-GoToSpeak

A small 0.5B parameter conversational model based on Qwen2.5-0.5B.

Training Dataset

The model was fine-tuned on 7500 examples.

The dataset includes:

  • Conversational dialogues in Russian and English
  • Up-to-date general knowledge (as of 2025-2026)
  • Simple Python coding tasks
  • Basic mathematics with step-by-step explanations

Training method: Supervised Fine-Tuning (SFT).

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "SLT-AI/SLT-0.5b-GoToSpeak"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs, 
    max_new_tokens=512, 
    temperature=0.7, 
    top_p=0.9
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
465
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SLT-AI/SLT-0.5b-GoToSpeak

Quantized
(108)
this model