Text Generation
Transformers
Safetensors
Italian
English
llama
italian
general-assistant
hf-format
1b
conversational
text-generation-inference
Instructions to use RthItalia/PINDARO-HF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RthItalia/PINDARO-HF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RthItalia/PINDARO-HF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("RthItalia/PINDARO-HF") model = AutoModelForCausalLM.from_pretrained("RthItalia/PINDARO-HF") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use RthItalia/PINDARO-HF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RthItalia/PINDARO-HF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RthItalia/PINDARO-HF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RthItalia/PINDARO-HF
- SGLang
How to use RthItalia/PINDARO-HF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RthItalia/PINDARO-HF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RthItalia/PINDARO-HF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RthItalia/PINDARO-HF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RthItalia/PINDARO-HF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use RthItalia/PINDARO-HF with Docker Model Runner:
docker model run hf.co/RthItalia/PINDARO-HF
PINDARO HF (General)
PINDARO HF is the Hugging Face-format release of the general-purpose Pindaro model.
Model At A Glance
- Architecture:
LlamaForCausalLM - Model type:
llama - Approx. parameters: ~1.1B
- Precision:
float16 - Context length:
2048 - Vocabulary size:
32002 - Languages: Italian, English
- Primary use: general assistant text generation
Included Files (HF)
model.safetensorsconfig.jsongeneration_config.jsontokenizer.jsontokenizer.modeltokenizer_config.jsonspecial_tokens_map.jsonadded_tokens.json
This repository is HF-only. GGUF artifacts are intentionally not included here.
Prompt Format
The tokenizer uses Noesis-style control tokens:
<|noesis|>(id32000)<|end|>(id32001)
Configured template behavior is based on:
{% for message in messages %}<|noesis|>
### Domanda
{{ message['content'] }}
### Risposta
{% endfor %}
A stable manual prompt pattern is:
<|noesis|>
### Domanda
Spiega cos'e una funzione in Python.
### Risposta
Quickstart (Transformers)
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "RthItalia/PINDARO-HF"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
)
prompt = "<|noesis|>
### Domanda
Spiega cos'e una funzione in Python.
### Risposta
"
inputs = tokenizer(prompt, return_tensors="pt")
# pad_token_id == eos_token_id for this model: pass attention_mask explicitly.
outputs = model.generate(
**inputs,
attention_mask=inputs["attention_mask"],
max_new_tokens=120,
do_sample=False,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
Validation Snapshot
Last internal validation snapshot: 2026-03-02
- HF load/config/tokenizer/model smoke tests: PASS
- Internal mini-eval (5 prompts, general quality gate): 1.00
Notes:
- This is an internal sanity check, not a public benchmark suite.
- Separate GGUF quality gating is tracked outside this HF-only repo.
Known Limitations
- Outputs can become repetitive on some long generations.
- As with other LLMs, factual and reasoning errors are possible.
- Use additional validation for high-stakes or production workflows.
Safety
- Do not use as sole source for legal, medical, or financial decisions.
- Add moderation, logging, and domain-specific safeguards in downstream apps.
Artifact Checksums (SHA256)
model.safetensors:778e5547c238d0e19738479562cdc310a38f5ee4c5354294a23dfccc92626e87config.json:ae832c409e0d6ad9c8881ec2bd287a8d7e7e9012b712513532cd3ad352ca0655generation_config.json:6ff47e725c0ec6d0f1895670de7ee68e61a4f99703f6c8e89aea6ab14ea02dc3tokenizer.json:51433f06369ac3e597dfa23a811215e3511b8f86588a830ded72344b76a193eetokenizer.model:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347tokenizer_config.json:02ca6d3ddfa1112eec7bd5f22a0e682338b5b2da8ddb6761e9d25e6d7b8188d0special_tokens_map.json:d7805e093432afcde852968cdeba3de08a6fe66e77609f4701decb87fc492f33added_tokens.json:ece349d292e246eac9a9072c1730f023e61567984a828fb0d25dccb14e3b7592
- Downloads last month
- 6