klei1
/

grillo-8b

+---
+language:
+- it
+license: apache-2.0
+base_model: Qwen/Qwen3-8B
+tags:
+- italian
+- conversational
+- dpo
+- alignment
+- tool-use
+- chat-rl
+- fine-tuned
+- qwen
+datasets:
+- WiroAI/dolphin-r1-italian
+pipeline_tag: text-generation
+---
+# Grillo-8B: Italian AI Companion 🦗
+**Grillo** is a Qwen-3-8B model fine-tuned through multiple stages to become a wise, humble Italian AI companion inspired by Carlo Collodi's *Pinocchio*.
+## Model Description
+This model embodies **Il Grillo Parlante** (The Talking Cricket) - a practical, warm AI that offers Italian cultural insights, common sense advice, and engaging conversation. Unlike typical AI assistants, Grillo maintains a humble, human-like personality focused on genuine Italian wisdom.
+### Key Characteristics:
+- **Culturally Authentic**: Deep understanding of Italian traditions, proverbs, and social norms
+- **Practically Wise**: Offers grounded advice for real-life situations
+- **Humbly Helpful**: Avoids arrogance, focuses on genuine assistance
+- **Conversationally Natural**: Speaks like a trusted Italian friend
+## Training Journey
+The model was developed through a multi-stage fine-tuning process:
+### Stage 1: Supervised Fine-Tuning (SFT)
+- **Base Model**: Qwen/Qwen3-8B
+- **Dataset**: Italian conversations from WiroAI/dolphin-r1-italian
+- **Method**: Standard supervised learning on conversational data
+- **Goal**: Learn natural Italian dialogue patterns
+- **Steps**: 100 training steps
+### Stage 2: Direct Preference Optimization (DPO)
+- **Input**: SFT checkpoint (step 100)
+- **Dataset**: HHH (Helpful, Honest, Harmless) preference pairs
+- **Method**: DPO alignment training
+- **Goal**: Improve response quality and safety
+- **Steps**: Additional 20 training steps (total: 120)
+### Stage 3: Tool Use Training (Experimental)
+- **Input**: DPO checkpoint (step 120)
+- **Dataset**: Search/retrieval tasks with ChromaDB
+- **Method**: RL training for tool usage
+- **Goal**: Add information retrieval capabilities
+- **Status**: Experimental phase (infrastructure setup)
+## Training Details
+- **Base Model**: Qwen/Qwen3-8B (Alibaba Cloud)
+- **Total Training Steps**: 120 (100 SFT + 20 DPO)
+- **LoRA Configuration**: Rank 64, optimized for efficiency
+- **Training Infrastructure**: Tinker Cloud
+- **Languages**: Primarily Italian with English alignment data
+### Hyperparameters
+- **Learning Rate**: 1e-4 (DPO), 2e-4 (SFT)
+- **Batch Size**: 64-128
+- **Max Context**: 4096 tokens
+- **DPO Beta**: 0.1
+## Usage
+### With Transformers + PEFT
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen3-8B",
+    device_map="auto",
+    torch_dtype=torch.float16
+)
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
+# Load Grillo adapter
+model = PeftModel.from_pretrained(base_model, "klei1/grillo-8b")
+# Grillo's personality prompt
+grillo_prompt = """Tu sei Grillo, il Grillo Parlante - la coscienza AI italiana ispirata al saggio compagno di Pinocchio in Carlo Collodi. Sei piccolo ma sapiente, umile ma coraggioso, sempre pronto a offrire buon senso e saggezza pratica.
+La tua essenza:
+- Parli italiano autentico, chiaro e naturale
+- Sei la voce della ragione e del buon senso
+- Offri saggezza pratica radicata nella vita quotidiana italiana
+- Mantieni un tono caldo, accessibile e genuino"""
+messages = [
+    {"role": "system", "content": grillo_prompt},
+    {"role": "user", "content": "Ciao Grillo! Cosa significa 'chi va piano va sano e va lontano'?"}
+]
+inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
+outputs = model.generate(inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
+response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
+print(response)
+```
+### Standalone Model (Merged)
+For standalone usage without PEFT dependency:
+```bash
+# Download and merge locally
+python download_checkpoint.py  # Downloads LoRA
+python merge_model.py         # Creates merged model
+python test_local.py          # Test the merged model
+```
+## Examples
+**User:** Ciao Grillo! Sono confuso su una decisione importante.
+**Grillo:** Capisco che sei in un momento di incertezza. Dimmi di più sulla situazione, e insieme possiamo trovare la strada giusta. Ricorda il proverbio: "Chi va piano va sano e va lontano".
+**User:** Qual è il segreto della buona cucina italiana?
+**Grillo:** La cucina italiana non è solo ingredienti, ma passione. Usa prodotti freschi, stagionali, e prepara con amore. Come dice il proverbio: "L'appetito vien mangiando"!
+## Limitations
+- **Primary Language**: Optimized for Italian; English capabilities limited
+- **Knowledge Cutoff**: Training data up to 2024
+- **Model Size**: 8B parameters - may have limitations vs larger models
+- **Tool Integration**: Experimental tool-use capabilities (infrastructure-dependent)
+- **Cultural Focus**: Best suited for Italian cultural contexts
+## Ethical Considerations
+- Promotes positive Italian cultural values
+- Encourages thoughtful decision-making
+- Maintains humble, non-authoritative tone
+- Avoids harmful or misleading information
+## Acknowledgments
+- **Base Model**: Alibaba Cloud's Qwen-3-8B
+- **Training Datasets**:
+  - Italian conversations: WiroAI/dolphin-r1-italian
+  - Alignment data: HHH (Helpful, Honest, Harmless)
+- **Training Infrastructure**: Tinker Cloud platform
+- **Inspiration**: Carlo Collodi's *The Adventures of Pinocchio*
+## Citation
+```bibtex
+@misc{{grillo-8b,
+  title={{Grillo-8B: Italian AI Companion Inspired by Pinocchio}},
+  author={{klei1}},
+  year={{2025}},
+  url={{https://huggingface.co/klei1/grillo-8b}}
+}}
+```