Instructions to use TitleOS/Vircava-3B-FP32 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TitleOS/Vircava-3B-FP32 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TitleOS/Vircava-3B-FP32") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("TitleOS/Vircava-3B-FP32") model = AutoModelForMultimodalLM.from_pretrained("TitleOS/Vircava-3B-FP32") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TitleOS/Vircava-3B-FP32 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TitleOS/Vircava-3B-FP32" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TitleOS/Vircava-3B-FP32", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/TitleOS/Vircava-3B-FP32
- SGLang
How to use TitleOS/Vircava-3B-FP32 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TitleOS/Vircava-3B-FP32" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TitleOS/Vircava-3B-FP32", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TitleOS/Vircava-3B-FP32" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TitleOS/Vircava-3B-FP32", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use TitleOS/Vircava-3B-FP32 with Docker Model Runner:
docker model run hf.co/TitleOS/Vircava-3B-FP32
Vircava-3B-FP32
Vircava-3B-FP32 is a Latvian-language fine-tune of ibm-granite/granite-4.1-3b, trained on TitleOS/latvian_glaiveai_reasoning-v1_5k_subset — a Latvian-translated subset of the GlaiveAI reasoning-v1 dataset. It's designed to bring chain-of-thought reasoning and conversational fluency in Latvian to hardware that most people actually own: CPUs, integrated GPUs, and low-end discrete cards. If you can run a 3B model at all, you can run this one.
Vircava is the first model in a planned family targeting Latvian as a first-class language for both general reasoning and creative writing.
What it can do
- Converse naturally in Latvian, including multi-turn dialogue
- Produce structured chain-of-thought reasoning in Latvian before arriving at an answer
- Use Granite's native tool-calling format, inherited from the base model and preserved through fine-tuning
- Handle mixed Latvian/English prompts gracefully
- Run entirely on CPU, making it usable without any GPU at all
Granite 4.1's tool-calling capabilities are part of the base model's instruction format and carry forward here. If you're building an agentic pipeline and want it to operate in Latvian, this is a reasonable starting point.
Intended hardware
This model is specifically sized and trained for accessibility. Target environments include:
- CPU inference via llama.cpp or Ollama (recommended for most users)
- Low-end consumer GPUs (4–8GB VRAM) with appropriate quantization (Q4_K_M or Q5_K_M recommended)
- Integrated graphics with shared memory setups
For CPU and low-VRAM deployments, use a quantized GGUF version. The FP32 weights in this repository are the canonical release intended for re-quantization or for users who want to derive their own quantized artifacts.
Quickstart
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "TitleOS/Vircava-3B-FP32"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float32,
device_map="cpu", # or "auto" if you have a GPU
)
messages = [
{
"role": "user",
"content": "Izskaidro, kāpēc debesis ir zilas. Domā soli pa solim."
}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
output = model.generate(input_ids, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
Training details
| Parameter | Value |
|---|---|
| Base model | ibm-granite/granite-4.1-3b |
| Training dataset | TitleOS/latvian_glaiveai_reasoning-v1_5k_subset |
| Fine-tuning method | LoRA (rsLoRA) |
| LoRA rank | 32 |
| LoRA alpha | 64 |
| rsLoRA scale | ~11.3 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 1 |
| Effective batch size | 16 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Max sequence length | 2048 |
| Precision | FP32 (full, no quantization during training) |
| Hardware | Tesla P40 (24GB) |
| Loss masking | Completion-only (assistant turns only) |
The dataset is a 5k-row Latvian translation of GlaiveAI's reasoning-v1 dataset, produced using Facebook's NLLB-200-3.3B translation model. The training mix also includes natural Latvian text from the RaivisDejus/latvian-text corpus to support general language fluency alongside structured reasoning.
Limitations
Vircava-3B-FP32 is an early-stage model. A few things to be realistic about:
- 3B parameters is small. Reasoning depth and instruction-following are more limited than larger models. Complex multi-step problems may produce partially correct chains.
- 5k training rows is a modest dataset. Latvian fluency is functional but not flawless. Unusual phrasings or domain-specific vocabulary may produce less natural output.
- Tool calling is inherited, not extensively validated. The base model's tool-calling format carries through, but testing has been limited to standard conversational use.
- This is not a safety-tuned model. It inherits Granite 4.1's base behavior. Do not deploy it in contexts requiring robust content filtering without additional alignment work.
- English bleed is possible. On prompts that mix Latvian and English, the model may respond partially or fully in English, particularly for topics that appeared rarely in Latvian in the training data.
The Vircava family (planned)
Vircava-3B-FP32 is the first release. Two 27B models are in development:
Riga-27B
A larger version of this model, fine-tuned for Latvian reasoning and conversation at scale. Intended for GPU-equipped deployments at universities, research institutions, and other organizations with proper inference infrastructure. Based on a 27B foundation model, it will offer substantially deeper reasoning chains and more robust Latvian fluency than the 3B variant.
Vircava-Rakstnieks-27B ("Writer", Placeholder title)
A Latvian creative writing model fine-tuned on LatSenRom, the Corpus of Latvian Early Novels (1879–1940), available through the Latvian National Corpus Collection at korpuss.lv. The base model is google/gemma-3-27b-it. The goal is a model that writes in the style and register of early Latvian literary prose — a register that no general-purpose model currently handles well, and one with significant cultural and research value.
Both models will be released under the same license as this one when training is complete.
License
Vircava-3B-FP32 is released under a modified MPL-2.0 license that includes a Common Clause modification. This means you are free to use, study, modify, and redistribute the model for non-commercial purposes, but you may not sell the model or a product where the model itself is the primary commercial value without explicit written permission.
See LICENSE.md for the full license text and terms.
Citation
If you use Vircava-3B-FP32 in research or a project, a citation or mention is appreciated:
@misc{vircava3b2025,
author = {TitleOS},
title = {Vircava-3B-FP32: A Latvian Reasoning Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/TitleOS/Vircava-3B-FP32}
}
Acknowledgements
- IBM Research for the Granite 4.1 base model
- GlaiveAI for the original reasoning dataset
- Raivijs Dejus for the aggregated Latvian text corpus
- Tilde and the University of Latvia for foundational Latvian NLP resources
- The Latvian National Corpus Collection for making Latvian language data accessible to researchers
- Downloads last month
- 2
Model tree for TitleOS/Vircava-3B-FP32
Base model
ibm-granite/granite-4.1-3b