Update README.md

b2baada verified 3 months ago

4.31 kB

	---
	language:
	- it
	- en
	license: apache-2.0
	library_name: peft
	base_model: Qwen/Qwen3-8B
	tags:
	- italian
	- conversational
	- dpo
	- alignment
	- roleplay
	- culture
	datasets:
	- WiroAI/dolphin-r1-italian
	pipeline_tag: text-generation
	---

	<div align="center">
	<img src="grillo.png" alt="Grillo Parlante AI" width="250"/>
	<h1>🦗 Grillo-8B: La Coscienza Artificiale</h1>

	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
	[![Language](https://img.shields.io/badge/Language-Italian-green.svg)]()
	[![Base Model](https://img.shields.io/badge/Base_Model-Qwen3--8B-yellow.svg)](https://huggingface.co/Qwen/Qwen3-8B)
	</div>

	---

	# Model Description

	Grillo is a culturally aware Italian AI companion based on the Qwen-3-8B architecture. Inspired by the character of Il Grillo Parlante (The Talking Cricket) from Carlo Collodi's Pinocchio, this model is fine-tuned to be wise, humble, and deeply rooted in Italian common sense ("buon senso").

	Unlike generic assistants, Grillo offers advice with a warm, slightly admonishing yet caring tone, prioritizing ethical guidance and practical wisdom over robotic neutrality.

	### 🌟 Key Characteristics
	* 🇮🇹 Culturally Authentic: Understands Italian idioms, proverbs (proverbi), and social nuances.
	* 🦉 Practically Wise: Offers grounded advice for real-life dilemmas.
	* 🤝 Humbly Helpful: Maintains a modest persona; helpful without being arrogant.
	* 💬 Natural Dialogue: Trained on high-quality conversational datasets to sound like a trusted friend.

	---

	# 🛤️ Training Journey

	The model was sculpted through a rigorous multi-stage process:

	### 1. Supervised Fine-Tuning (SFT)
	* Objective: Instill natural Italian dialogue patterns.
	* Data: [WiroAI/dolphin-r1-italian](https://huggingface.co/datasets/WiroAI/dolphin-r1-italian).
	* Duration: 100 Steps.

	### 2. Direct Preference Optimization (DPO)
	* Objective: Align the model with Helpful, Honest, and Harmless (HHH) principles.
	* Method: Preference ranking to reduce toxicity and improve safety.
	* Duration: +20 Steps (120 Total).

	### 3. Experimental Tool Use (RL)
	* Status: Experimental Phase.
	* Objective: Integration with ChromaDB for information retrieval capabilities.

	---

	# ⚙️ Technical Specifications

	\| Parameter \| Value \|
	\| :--- \| :--- \|
	\| Base Model \| Qwen/Qwen3-8B \|
	\| Architecture \| Transformer Decoder (8B params) \|
	\| LoRA Rank \| 64 \|
	\| LoRA Alpha \| 32 \|
	\| Learning Rate \| 2e-4 (SFT) / 1e-4 (DPO) \|
	\| Context Window \| 4096 tokens \|
	\| Training Hardware \| Tinker Cloud (NVIDIA GPUs) \|

	---

	# 💻 Usage

	### Quickstart with Transformers + PEFT (Adapter Loading)

	This method loads the Grillo adapter on top of the base Qwen model, which is memory-efficient.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# 1. Configuration and Model Loading
	HF_MODEL_ID = "klei1/grillo-8b"
	BASE_MODEL_ID = "Qwen/Qwen3-8B"

	# Load the base model
	base_model = AutoModelForCausalLM.from_pretrained(
	BASE_MODEL_ID,
	device_map="auto",
	torch_dtype=torch.float16,
	trust_remote_code=True
	)
	tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)

	# 2. Load Grillo Adapter (LoRA)
	model = PeftModel.from_pretrained(base_model, HF_MODEL_ID)
	model = model.eval() # Set model to evaluation mode

	# 3. Define the System Persona (Crucial for performance)
	system_prompt = """Tu sei Grillo, il Grillo Parlante.
	Sei piccolo ma sapiente, umile ma coraggioso.
	Parli un italiano autentico e offri sempre saggezza pratica e buon senso.
	Non sei un assistente robotico, sei una coscienza morale."""

	messages = [
	{"role": "system", "content": system_prompt},
	{"role": "user", "content": "Grillo, ho paura di aver fatto una scelta sbagliata..."}
	]

	# 4. Generate Response
	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
	outputs = model.generate(
	inputs,
	max_new_tokens=256,
	temperature=0.7,
	do_sample=True,
	eos_token_id=tokenizer.eos_token_id
	)

	response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
	print(response)