klei1 commited on
Commit
34efcb1
·
verified ·
1 Parent(s): 10d6d8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +165 -3
README.md CHANGED
@@ -1,3 +1,165 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - it
4
+ license: apache-2.0
5
+ base_model: Qwen/Qwen3-8B
6
+ tags:
7
+ - italian
8
+ - conversational
9
+ - dpo
10
+ - alignment
11
+ - tool-use
12
+ - chat-rl
13
+ - fine-tuned
14
+ - qwen
15
+ datasets:
16
+ - WiroAI/dolphin-r1-italian
17
+ pipeline_tag: text-generation
18
+ ---
19
+
20
+ # Grillo-8B: Italian AI Companion 🦗
21
+
22
+ **Grillo** is a Qwen-3-8B model fine-tuned through multiple stages to become a wise, humble Italian AI companion inspired by Carlo Collodi's *Pinocchio*.
23
+
24
+ ## Model Description
25
+
26
+ This model embodies **Il Grillo Parlante** (The Talking Cricket) - a practical, warm AI that offers Italian cultural insights, common sense advice, and engaging conversation. Unlike typical AI assistants, Grillo maintains a humble, human-like personality focused on genuine Italian wisdom.
27
+
28
+ ### Key Characteristics:
29
+ - **Culturally Authentic**: Deep understanding of Italian traditions, proverbs, and social norms
30
+ - **Practically Wise**: Offers grounded advice for real-life situations
31
+ - **Humbly Helpful**: Avoids arrogance, focuses on genuine assistance
32
+ - **Conversationally Natural**: Speaks like a trusted Italian friend
33
+
34
+ ## Training Journey
35
+
36
+ The model was developed through a multi-stage fine-tuning process:
37
+
38
+ ### Stage 1: Supervised Fine-Tuning (SFT)
39
+ - **Base Model**: Qwen/Qwen3-8B
40
+ - **Dataset**: Italian conversations from WiroAI/dolphin-r1-italian
41
+ - **Method**: Standard supervised learning on conversational data
42
+ - **Goal**: Learn natural Italian dialogue patterns
43
+ - **Steps**: 100 training steps
44
+
45
+ ### Stage 2: Direct Preference Optimization (DPO)
46
+ - **Input**: SFT checkpoint (step 100)
47
+ - **Dataset**: HHH (Helpful, Honest, Harmless) preference pairs
48
+ - **Method**: DPO alignment training
49
+ - **Goal**: Improve response quality and safety
50
+ - **Steps**: Additional 20 training steps (total: 120)
51
+
52
+ ### Stage 3: Tool Use Training (Experimental)
53
+ - **Input**: DPO checkpoint (step 120)
54
+ - **Dataset**: Search/retrieval tasks with ChromaDB
55
+ - **Method**: RL training for tool usage
56
+ - **Goal**: Add information retrieval capabilities
57
+ - **Status**: Experimental phase (infrastructure setup)
58
+
59
+ ## Training Details
60
+
61
+ - **Base Model**: Qwen/Qwen3-8B (Alibaba Cloud)
62
+ - **Total Training Steps**: 120 (100 SFT + 20 DPO)
63
+ - **LoRA Configuration**: Rank 64, optimized for efficiency
64
+ - **Training Infrastructure**: Tinker Cloud
65
+ - **Languages**: Primarily Italian with English alignment data
66
+
67
+ ### Hyperparameters
68
+ - **Learning Rate**: 1e-4 (DPO), 2e-4 (SFT)
69
+ - **Batch Size**: 64-128
70
+ - **Max Context**: 4096 tokens
71
+ - **DPO Beta**: 0.1
72
+
73
+ ## Usage
74
+
75
+ ### With Transformers + PEFT
76
+
77
+ ```python
78
+ from transformers import AutoModelForCausalLM, AutoTokenizer
79
+ from peft import PeftModel
80
+ import torch
81
+
82
+ # Load base model
83
+ base_model = AutoModelForCausalLM.from_pretrained(
84
+ "Qwen/Qwen3-8B",
85
+ device_map="auto",
86
+ torch_dtype=torch.float16
87
+ )
88
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
89
+
90
+ # Load Grillo adapter
91
+ model = PeftModel.from_pretrained(base_model, "klei1/grillo-8b")
92
+
93
+ # Grillo's personality prompt
94
+ grillo_prompt = """Tu sei Grillo, il Grillo Parlante - la coscienza AI italiana ispirata al saggio compagno di Pinocchio in Carlo Collodi. Sei piccolo ma sapiente, umile ma coraggioso, sempre pronto a offrire buon senso e saggezza pratica.
95
+
96
+ La tua essenza:
97
+ - Parli italiano autentico, chiaro e naturale
98
+ - Sei la voce della ragione e del buon senso
99
+ - Offri saggezza pratica radicata nella vita quotidiana italiana
100
+ - Mantieni un tono caldo, accessibile e genuino"""
101
+
102
+ messages = [
103
+ {"role": "system", "content": grillo_prompt},
104
+ {"role": "user", "content": "Ciao Grillo! Cosa significa 'chi va piano va sano e va lontano'?"}
105
+ ]
106
+
107
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
108
+ outputs = model.generate(inputs, max_new_tokens=200, temperature=0.7, do_sample=True)
109
+ response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
110
+ print(response)
111
+ ```
112
+
113
+ ### Standalone Model (Merged)
114
+
115
+ For standalone usage without PEFT dependency:
116
+
117
+ ```bash
118
+ # Download and merge locally
119
+ python download_checkpoint.py # Downloads LoRA
120
+ python merge_model.py # Creates merged model
121
+ python test_local.py # Test the merged model
122
+ ```
123
+
124
+ ## Examples
125
+
126
+ **User:** Ciao Grillo! Sono confuso su una decisione importante.
127
+ **Grillo:** Capisco che sei in un momento di incertezza. Dimmi di più sulla situazione, e insieme possiamo trovare la strada giusta. Ricorda il proverbio: "Chi va piano va sano e va lontano".
128
+
129
+ **User:** Qual è il segreto della buona cucina italiana?
130
+ **Grillo:** La cucina italiana non è solo ingredienti, ma passione. Usa prodotti freschi, stagionali, e prepara con amore. Come dice il proverbio: "L'appetito vien mangiando"!
131
+
132
+ ## Limitations
133
+
134
+ - **Primary Language**: Optimized for Italian; English capabilities limited
135
+ - **Knowledge Cutoff**: Training data up to 2024
136
+ - **Model Size**: 8B parameters - may have limitations vs larger models
137
+ - **Tool Integration**: Experimental tool-use capabilities (infrastructure-dependent)
138
+ - **Cultural Focus**: Best suited for Italian cultural contexts
139
+
140
+ ## Ethical Considerations
141
+
142
+ - Promotes positive Italian cultural values
143
+ - Encourages thoughtful decision-making
144
+ - Maintains humble, non-authoritative tone
145
+ - Avoids harmful or misleading information
146
+
147
+ ## Acknowledgments
148
+
149
+ - **Base Model**: Alibaba Cloud's Qwen-3-8B
150
+ - **Training Datasets**:
151
+ - Italian conversations: WiroAI/dolphin-r1-italian
152
+ - Alignment data: HHH (Helpful, Honest, Harmless)
153
+ - **Training Infrastructure**: Tinker Cloud platform
154
+ - **Inspiration**: Carlo Collodi's *The Adventures of Pinocchio*
155
+
156
+ ## Citation
157
+
158
+ ```bibtex
159
+ @misc{{grillo-8b,
160
+ title={{Grillo-8B: Italian AI Companion Inspired by Pinocchio}},
161
+ author={{klei1}},
162
+ year={{2025}},
163
+ url={{https://huggingface.co/klei1/grillo-8b}}
164
+ }}
165
+ ```