Instructions to use RthItalia/PINDARO-HF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RthItalia/PINDARO-HF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RthItalia/PINDARO-HF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RthItalia/PINDARO-HF")
model = AutoModelForCausalLM.from_pretrained("RthItalia/PINDARO-HF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RthItalia/PINDARO-HF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RthItalia/PINDARO-HF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RthItalia/PINDARO-HF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RthItalia/PINDARO-HF

SGLang

How to use RthItalia/PINDARO-HF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RthItalia/PINDARO-HF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RthItalia/PINDARO-HF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RthItalia/PINDARO-HF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RthItalia/PINDARO-HF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RthItalia/PINDARO-HF with Docker Model Runner:
```
docker model run hf.co/RthItalia/PINDARO-HF
```

PINDARO HF (General)

PINDARO HF is the Hugging Face-format release of the general-purpose Pindaro model.

Model At A Glance

Architecture: LlamaForCausalLM
Model type: llama
Approx. parameters: ~1.1B
Precision: float16
Context length: 2048
Vocabulary size: 32002
Languages: Italian, English
Primary use: general assistant text generation

Included Files (HF)

model.safetensors
config.json
generation_config.json
tokenizer.json
tokenizer.model
tokenizer_config.json
special_tokens_map.json
added_tokens.json

This repository is HF-only. GGUF artifacts are intentionally not included here.

Prompt Format

The tokenizer uses Noesis-style control tokens:

<|noesis|> (id 32000)
<|end|> (id 32001)

Configured template behavior is based on:

{% for message in messages %}<|noesis|>
### Domanda
{{ message['content'] }}

### Risposta
{% endfor %}

A stable manual prompt pattern is:

<|noesis|>
### Domanda
Spiega cos'e una funzione in Python.

### Risposta

Quickstart (Transformers)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "RthItalia/PINDARO-HF"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
)

prompt = "<|noesis|>
### Domanda
Spiega cos'e una funzione in Python.

### Risposta
"

inputs = tokenizer(prompt, return_tensors="pt")

# pad_token_id == eos_token_id for this model: pass attention_mask explicitly.
outputs = model.generate(
    **inputs,
    attention_mask=inputs["attention_mask"],
    max_new_tokens=120,
    do_sample=False,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Validation Snapshot

Last internal validation snapshot: 2026-03-02

HF load/config/tokenizer/model smoke tests: PASS
Internal mini-eval (5 prompts, general quality gate): 1.00

Notes:

This is an internal sanity check, not a public benchmark suite.
Separate GGUF quality gating is tracked outside this HF-only repo.

Known Limitations

Outputs can become repetitive on some long generations.
As with other LLMs, factual and reasoning errors are possible.
Use additional validation for high-stakes or production workflows.

Safety

Do not use as sole source for legal, medical, or financial decisions.
Add moderation, logging, and domain-specific safeguards in downstream apps.

Artifact Checksums (SHA256)

model.safetensors: 778e5547c238d0e19738479562cdc310a38f5ee4c5354294a23dfccc92626e87
config.json: ae832c409e0d6ad9c8881ec2bd287a8d7e7e9012b712513532cd3ad352ca0655
generation_config.json: 6ff47e725c0ec6d0f1895670de7ee68e61a4f99703f6c8e89aea6ab14ea02dc3
tokenizer.json: 51433f06369ac3e597dfa23a811215e3511b8f86588a830ded72344b76a193ee
tokenizer.model: 9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
tokenizer_config.json: 02ca6d3ddfa1112eec7bd5f22a0e682338b5b2da8ddb6761e9d25e6d7b8188d0
special_tokens_map.json: d7805e093432afcde852968cdeba3de08a6fe66e77609f4701decb87fc492f33
added_tokens.json: ece349d292e246eac9a9072c1730f023e61567984a828fb0d25dccb14e3b7592

Downloads last month: 22

Safetensors

Model size

1B params

Tensor type

F16