Instructions to use TitleOS/Vircava-3B-FP32 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TitleOS/Vircava-3B-FP32 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TitleOS/Vircava-3B-FP32")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("TitleOS/Vircava-3B-FP32")
model = AutoModelForMultimodalLM.from_pretrained("TitleOS/Vircava-3B-FP32")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TitleOS/Vircava-3B-FP32 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TitleOS/Vircava-3B-FP32"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TitleOS/Vircava-3B-FP32",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TitleOS/Vircava-3B-FP32

SGLang

How to use TitleOS/Vircava-3B-FP32 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TitleOS/Vircava-3B-FP32" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TitleOS/Vircava-3B-FP32",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TitleOS/Vircava-3B-FP32" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TitleOS/Vircava-3B-FP32",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use TitleOS/Vircava-3B-FP32 with Docker Model Runner:
```
docker model run hf.co/TitleOS/Vircava-3B-FP32
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Vircava-3B-FP32

Vircava-3B-FP32 is a Latvian-language fine-tune of ibm-granite/granite-4.1-3b, trained on TitleOS/latvian_glaiveai_reasoning-v1_5k_subset — a Latvian-translated subset of the GlaiveAI reasoning-v1 dataset. It's designed to bring chain-of-thought reasoning and conversational fluency in Latvian to hardware that most people actually own: CPUs, integrated GPUs, and low-end discrete cards. If you can run a 3B model at all, you can run this one.

Vircava is the first model in a planned family targeting Latvian as a first-class language for both general reasoning and creative writing.

What it can do

Converse naturally in Latvian, including multi-turn dialogue
Produce structured chain-of-thought reasoning in Latvian before arriving at an answer
Use Granite's native tool-calling format, inherited from the base model and preserved through fine-tuning
Handle mixed Latvian/English prompts gracefully
Run entirely on CPU, making it usable without any GPU at all

Granite 4.1's tool-calling capabilities are part of the base model's instruction format and carry forward here. If you're building an agentic pipeline and want it to operate in Latvian, this is a reasonable starting point.

Intended hardware

This model is specifically sized and trained for accessibility. Target environments include:

CPU inference via llama.cpp or Ollama (recommended for most users)
Low-end consumer GPUs (4–8GB VRAM) with appropriate quantization (Q4_K_M or Q5_K_M recommended)
Integrated graphics with shared memory setups

For CPU and low-VRAM deployments, use a quantized GGUF version. The FP32 weights in this repository are the canonical release intended for re-quantization or for users who want to derive their own quantized artifacts.

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "TitleOS/Vircava-3B-FP32"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float32,
    device_map="cpu",  # or "auto" if you have a GPU
)

messages = [
    {
        "role": "user",
        "content": "Izskaidro, kāpēc debesis ir zilas. Domā soli pa solim."
    }
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output = model.generate(input_ids, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))

Training details

Parameter	Value
Base model	ibm-granite/granite-4.1-3b
Training dataset	TitleOS/latvian_glaiveai_reasoning-v1_5k_subset
Fine-tuning method	LoRA (rsLoRA)
LoRA rank	32
LoRA alpha	64
rsLoRA scale	~11.3
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs	1
Effective batch size	16
Learning rate	2e-4
LR scheduler	Cosine
Max sequence length	2048
Precision	FP32 (full, no quantization during training)
Hardware	Tesla P40 (24GB)
Loss masking	Completion-only (assistant turns only)

The dataset is a 5k-row Latvian translation of GlaiveAI's reasoning-v1 dataset, produced using Facebook's NLLB-200-3.3B translation model. The training mix also includes natural Latvian text from the RaivisDejus/latvian-text corpus to support general language fluency alongside structured reasoning.

Limitations

Vircava-3B-FP32 is an early-stage model. A few things to be realistic about:

3B parameters is small. Reasoning depth and instruction-following are more limited than larger models. Complex multi-step problems may produce partially correct chains.
5k training rows is a modest dataset. Latvian fluency is functional but not flawless. Unusual phrasings or domain-specific vocabulary may produce less natural output.
Tool calling is inherited, not extensively validated. The base model's tool-calling format carries through, but testing has been limited to standard conversational use.
This is not a safety-tuned model. It inherits Granite 4.1's base behavior. Do not deploy it in contexts requiring robust content filtering without additional alignment work.
English bleed is possible. On prompts that mix Latvian and English, the model may respond partially or fully in English, particularly for topics that appeared rarely in Latvian in the training data.

The Vircava family (planned)

Vircava-3B-FP32 is the first release. Two 27B models are in development:

Riga-27B

A larger version of this model, fine-tuned for Latvian reasoning and conversation at scale. Intended for GPU-equipped deployments at universities, research institutions, and other organizations with proper inference infrastructure. Based on a 27B foundation model, it will offer substantially deeper reasoning chains and more robust Latvian fluency than the 3B variant.

Vircava-Rakstnieks-27B ("Writer", Placeholder title)

A Latvian creative writing model fine-tuned on LatSenRom, the Corpus of Latvian Early Novels (1879–1940), available through the Latvian National Corpus Collection at korpuss.lv. The base model is google/gemma-3-27b-it. The goal is a model that writes in the style and register of early Latvian literary prose — a register that no general-purpose model currently handles well, and one with significant cultural and research value.

Both models will be released under the same license as this one when training is complete.

License

Vircava-3B-FP32 is released under a modified MPL-2.0 license that includes a Common Clause modification. This means you are free to use, study, modify, and redistribute the model for non-commercial purposes, but you may not sell the model or a product where the model itself is the primary commercial value without explicit written permission.

See LICENSE.md for the full license text and terms.

Citation

If you use Vircava-3B-FP32 in research or a project, a citation or mention is appreciated:

@misc{vircava3b2025,
  author = {TitleOS},
  title = {Vircava-3B-FP32: A Latvian Reasoning Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/TitleOS/Vircava-3B-FP32}
}

Acknowledgements

IBM Research for the Granite 4.1 base model
GlaiveAI for the original reasoning dataset
Raivijs Dejus for the aggregated Latvian text corpus
Tilde and the University of Latvia for foundational Latvian NLP resources
The Latvian National Corpus Collection for making Latvian language data accessible to researchers

Downloads last month: 2

Safetensors

Model size

3B params

Tensor type

F32

Model tree for TitleOS/Vircava-3B-FP32

Base model

ibm-granite/granite-4.1-3b

Finetuned

(10)

this model

TitleOS
/

Vircava-3B-FP32