Safetensors
Indonesian
gpt2
instruct-tuned
izzulgod commited on
Commit
c58da85
Β·
1 Parent(s): 8b03886

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +196 -0
README.md ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - id
5
+ base_model:
6
+ - cahya/gpt2-small-indonesian-522M
7
+ tags:
8
+ - instruct-tuned
9
+ ---
10
+
11
+ # GPT2-Small Indonesian Chat Instruct-Tuned Model
12
+
13
+ An Indonesian conversational AI model fine-tuned from `GPT2-Small(124M Parameters)` using instruction-following techniques to enable chat-like interactions.
14
+
15
+ ## πŸ“‹ Model Overview
16
+
17
+ This model transforms a base Indonesian GPT-2 text generator into a conversational chatbot capable of following instructions and engaging in question-answering dialogues in Bahasa Indonesia.
18
+
19
+ - **Base Model**: `GPT2-Small`
20
+ - **Fine-tuning Method**: SFT-LoRA (merged adapter)
21
+ - **Dataset**: `indonesian-nlp/wikipedia-id`, `FreedomIntelligence/evol-instruct-indonesian`, `FreedomIntelligence/sharegpt-indonesian`
22
+ - **Language**: Indonesian (Bahasa Indonesia)
23
+ - **Task**: Conversational AI / Chat Completion
24
+
25
+ ## πŸ§ͺ Project Background
26
+
27
+ This model was fine-tuned as part of my personal learning journey in AI and LLMs. The training was done entirely on Google Colab (free tier, T4 GPU), as an exercise in building Indonesian conversational AI with limited resources.
28
+
29
+ ## πŸš€ Quick Start
30
+
31
+ ```python
32
+ from transformers import AutoTokenizer, AutoModelForCausalLM
33
+ import torch
34
+
35
+ # Setup device
36
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
37
+ print(f"Using device: {device}")
38
+
39
+ # Load model and tokenizer
40
+ model_path = "IzzulGod/GPT2-Small-Indonesian"
41
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
42
+ model = AutoModelForCausalLM.from_pretrained(model_path).to(device)
43
+
44
+ # Generate response
45
+ prompt = "User: Siapa presiden pertama Indonesia?\nAI:"
46
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
47
+
48
+ with torch.no_grad():
49
+ outputs = model.generate(
50
+ **inputs,
51
+ max_new_tokens=128,
52
+ do_sample=True,
53
+ temperature=0.7,
54
+ top_p=0.95,
55
+ repetition_penalty=1.2,
56
+ pad_token_id=tokenizer.eos_token_id
57
+ )
58
+
59
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
60
+ print(response)
61
+ ```
62
+
63
+ ### Example Output
64
+
65
+ ```
66
+ User: Siapa presiden pertama Indonesia?
67
+ AI: Presiden pertama Indonesia adalah Soekarno. Sukarno dikenal sebagai seorang pemimpin yang sangat dihormati dan dicintai oleh rakyatnya, terutama di kalangan rakyat Indonesia karena perananya dalam membentuk persatuan bangsa Indonesia. Dia juga dianggap sebagai sosok kunci bagi seluruh masyarakat Indonesia untuk mempertahankan kemerdekaan negara tersebut dari penjajahan Belanda.
68
+ ```
69
+
70
+ ## 🎯 Model Capabilities
71
+
72
+ - **Question Answering**: Responds to factual questions in Indonesian
73
+ - **Instruction Following**: Capable of following various instructions and tasks
74
+ - **Conversational Context**: Maintains context in chat-like interactions
75
+ - **Code Generation**: Can generate simple code snippets (R, Python, etc.) with Indonesian explanations
76
+
77
+ ## πŸ“Š Training Details
78
+
79
+ ### Dataset
80
+
81
+ This model was trained on a dataset containing conversation data in the following format:
82
+
83
+ ```json
84
+ [
85
+ {
86
+ "from": "human",
87
+ "value": "Question or instruction in Indonesian"
88
+ },
89
+ {
90
+ "from": "gpt",
91
+ "value": "Detailed response in Indonesian"
92
+ }
93
+ ]
94
+ ```
95
+
96
+ ### Training Configuration
97
+
98
+ The model was fine-tuned using LoRA (Low-Rank Adaptation) with aggressive parameter injection across key GPT-2 layers:
99
+
100
+ **LoRA Configuration:**
101
+ - `r`: 64 (rank)
102
+ - `lora_alpha`: 128
103
+ - `target_modules`: ["c_attn", "c_proj", "mlp.c_fc", "mlp.c_proj"]
104
+ - `lora_dropout`: 0.05
105
+ - `bias`: "none"
106
+
107
+ **Training Arguments:**
108
+ - `epochs`: 3
109
+ - `batch_size`: 16 per device
110
+ - `gradient_accumulation_steps`: 2
111
+ - `learning_rate`: 2e-4
112
+ - `scheduler`: cosine
113
+ - `weight_decay`: 0.01
114
+ - `fp16`: enabled
115
+
116
+ ### Training Results
117
+
118
+ ```
119
+ Final Training Loss: 2.692
120
+ Total Steps: 2,766
121
+ Training Time: ~1h 45m
122
+ ```
123
+
124
+ The model showed consistent improvement with loss decreasing from 3.44 to 2.51 over the training period.
125
+
126
+ ## πŸ”§ Advanced Usage
127
+
128
+ ### Custom Generation Parameters
129
+
130
+ ```python
131
+ # For more creative responses
132
+ outputs = model.generate(
133
+ **inputs,
134
+ max_new_tokens=100,
135
+ do_sample=True,
136
+ temperature=0.8,
137
+ top_p=0.9,
138
+ repetition_penalty=1.3
139
+ )
140
+
141
+ # For more focused responses
142
+ outputs = model.generate(
143
+ **inputs,
144
+ max_new_tokens=50,
145
+ do_sample=True,
146
+ temperature=0.4,
147
+ top_p=0.95,
148
+ repetition_penalty=1.1
149
+ )
150
+ ```
151
+
152
+ ### Prompt Format
153
+
154
+ The model expects prompts in the following format:
155
+ ```
156
+ User: [Your question or instruction in Indonesian]
157
+ AI:
158
+ ```
159
+
160
+ ## ⚠️ Limitations
161
+
162
+ - **Knowledge Base**: The base model was trained primarily on Wikipedia data: `indonesian-nlp/wikipedia-id` by [Cahya](https://huggingface.co/cahya), providing general factual knowledge but limited real-world conversational patterns
163
+ - **Training Data Scope**: Current fine-tuning focuses on general instruction-following and Q&A rather than natural daily conversations
164
+ - **Conversational Style**: Responses may feel formal or academic due to the Wikipedia-based foundation and instruction-tuned nature
165
+ - **Model Size**: Relatively small (124M Parameters), which may limit complex reasoning capabilities
166
+ - **Factual Accuracy**: Responses are generated based on training data and may not always be factually accurate or up-to-date
167
+ - **Language Optimization**: Best performance is achieved with Indonesian language inputs
168
+ - **Response Consistency**: May occasionally generate repetitive or inconsistent responses
169
+
170
+ ## πŸš€ Future Improvements
171
+
172
+ For enhanced conversational naturalness, consider:
173
+ - **Conversational Dataset Training**: Fine-tuning with Indonesian daily conversation datasets
174
+ - **Lighter LoRA Configuration**: Using more efficient LoRA parameters for conversation-specific training
175
+ - **Multi-turn Dialogue**: Training on multi-turn conversation data for better context handling
176
+ - **Informal Language Patterns**: Incorporating colloquial Indonesian expressions and casual speech patterns
177
+
178
+ ## πŸ“ License
179
+
180
+ This model is released under the MIT License. See the LICENSE file for details.
181
+
182
+ ## πŸ“š Citation
183
+
184
+ If you use this model in your research or applications, please cite:
185
+
186
+ ```bibtex
187
+ @misc{izzulgod2025gpt2indochat,
188
+ title = {GPT2-Small Indonesian Chat Instruct-Tuned Model},
189
+ author = {IzzulGod},
190
+ year = {2025},
191
+ howpublished = {\url{https://huggingface.co/IzzulGod/GPT2-Small-Indonesian}},
192
+ }
193
+ ```
194
+ ---
195
+
196
+ *Disclaimer: This model was developed as an experimental project for learning purposes. While it performs well on basic tasks, it may have limitations in reasoning and real-world usage.*