🦗 Grillo-8B: La Coscienza Artificiale

---
language:
- it
- en
license: apache-2.0
library_name: peft
base_model: Qwen/Qwen3-8B
tags:
- italian
- conversational
- dpo
- alignment
- roleplay
- culture
datasets:
- WiroAI/dolphin-r1-italian
pipeline_tag: text-generation
---

<div align="center">
  <img src="grillo.png" alt="Grillo Parlante AI" width="250"/>
  <h1>🦗 Grillo-8B: La Coscienza Artificiale</h1>

  [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
  [![Language](https://img.shields.io/badge/Language-Italian-green.svg)]()
  [![Base Model](https://img.shields.io/badge/Base_Model-Qwen3--8B-yellow.svg)](https://huggingface.co/Qwen/Qwen3-8B)
</div>

---

# Model Description

**Grillo** is a culturally aware Italian AI companion based on the **Qwen-3-8B** architecture. Inspired by the character of *Il Grillo Parlante* (The Talking Cricket) from Carlo Collodi's *Pinocchio*, this model is fine-tuned to be wise, humble, and deeply rooted in Italian common sense ("buon senso").

Unlike generic assistants, Grillo offers advice with a warm, slightly admonishing yet caring tone, prioritizing ethical guidance and practical wisdom over robotic neutrality.

### 🌟 Key Characteristics
* **🇮🇹 Culturally Authentic:** Understands Italian idioms, proverbs (*proverbi*), and social nuances.
* **🦉 Practically Wise:** Offers grounded advice for real-life dilemmas.
* **🤝 Humbly Helpful:** Maintains a modest persona; helpful without being arrogant.
* **💬 Natural Dialogue:** Trained on high-quality conversational datasets to sound like a trusted friend.

---

# 🛤️ Training Journey

The model was sculpted through a rigorous multi-stage process:

### 1. Supervised Fine-Tuning (SFT)
* **Objective:** Instill natural Italian dialogue patterns.
* **Data:** [WiroAI/dolphin-r1-italian](https://huggingface.co/datasets/WiroAI/dolphin-r1-italian).
* **Duration:** 100 Steps.

### 2. Direct Preference Optimization (DPO)
* **Objective:** Align the model with Helpful, Honest, and Harmless (HHH) principles.
* **Method:** Preference ranking to reduce toxicity and improve safety.
* **Duration:** +20 Steps (120 Total).

### 3. Experimental Tool Use (RL)
* **Status:** *Experimental Phase.*
* **Objective:** Integration with ChromaDB for information retrieval capabilities.

---

# ⚙️ Technical Specifications

| Parameter | Value |
| :--- | :--- |
| **Base Model** | Qwen/Qwen3-8B |
| **Architecture** | Transformer Decoder (8B params) |
| **LoRA Rank** | 64 |
| **LoRA Alpha** | 32 |
| **Learning Rate** | 2e-4 (SFT) / 1e-4 (DPO) |
| **Context Window** | 4096 tokens |
| **Training Hardware** | Tinker Cloud (NVIDIA GPUs) |

---

# 💻 Usage

### Quickstart with Transformers + PEFT (Adapter Loading)

This method loads the Grillo adapter on top of the base Qwen model, which is memory-efficient.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# 1. Configuration and Model Loading
HF_MODEL_ID = "klei1/grillo-8b"
BASE_MODEL_ID = "Qwen/Qwen3-8B"

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)

# 2. Load Grillo Adapter (LoRA)
model = PeftModel.from_pretrained(base_model, HF_MODEL_ID)
model = model.eval() # Set model to evaluation mode

# 3. Define the System Persona (Crucial for performance)
system_prompt = """Tu sei Grillo, il Grillo Parlante.
Sei piccolo ma sapiente, umile ma coraggioso.
Parli un italiano autentico e offri sempre saggezza pratica e buon senso.
Non sei un assistente robotico, sei una coscienza morale."""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Grillo, ho paura di aver fatto una scelta sbagliata..."}
]

# 4. Generate Response
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(
    inputs,
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    eos_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)