Create README.md
#1
by wildanaziz - opened
README.md
ADDED
|
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model:
|
| 3 |
+
- aitfindonesia/Bakti-8B-Base
|
| 4 |
+
- aitfindonesia/KominfoUB-8B-Base
|
| 5 |
+
library_name: peft
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
tags:
|
| 9 |
+
- base_model:adapter:aitfindonesia/Bakti-8B-Base
|
| 10 |
+
- lora
|
| 11 |
+
- sft
|
| 12 |
+
- transformers
|
| 13 |
+
- unsloth
|
| 14 |
+
- multi-turn
|
| 15 |
+
- chatbot
|
| 16 |
+
- indonesian
|
| 17 |
+
datasets:
|
| 18 |
+
- dtp-fine-tuning/dtp-multiturn-interview-valid-15k
|
| 19 |
+
language:
|
| 20 |
+
- id
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
# Model Card for SFT-Bakti-8B-Base-MultiTurn-Chatbot
|
| 24 |
+
|
| 25 |
+
## Model Details
|
| 26 |
+
|
| 27 |
+
### Model Description
|
| 28 |
+
|
| 29 |
+
This model is a fine-tuned version of **[aitfindonesia/Bakti-8B-Base]** designed specifically for **multi-turn conversational capabilities** in the Indonesian language. It was trained using the **Unsloth** library for faster and memory-efficient training, utilizing LoRA (Low-Rank Adaptation).
|
| 30 |
+
|
| 31 |
+
The model is optimized to handle context retention across multiple turns of conversation, making it suitable for interview simulations, customer support, and general-purpose Indonesian assistants.
|
| 32 |
+
|
| 33 |
+
- **Developed by:** DTP Fine Tuning Team
|
| 34 |
+
- **Model type:** Causal Language Model (Fine-tuned Qwen2/3 architecture)
|
| 35 |
+
- **Language(s) (NLP):** Indonesian
|
| 36 |
+
- **License:** Apache 2.0
|
| 37 |
+
- **Finetuned from model:** aitfindonesia/Bakti-8B-Base
|
| 38 |
+
|
| 39 |
+
## Uses
|
| 40 |
+
|
| 41 |
+
### Direct Use
|
| 42 |
+
|
| 43 |
+
The model is designed for:
|
| 44 |
+
- Multi-turn chat interactions in Indonesian.
|
| 45 |
+
- Question Answering (QA) requiring context from previous turns.
|
| 46 |
+
- Roleplay interactions (e.g., interview scenarios).
|
| 47 |
+
|
| 48 |
+
### Out-of-Scope Use
|
| 49 |
+
|
| 50 |
+
- The model should not be used for generating factually accurate data without RAG (Retrieval Augmented Generation) as hallucinations are possible.
|
| 51 |
+
- Not intended for code generation tasks.
|
| 52 |
+
|
| 53 |
+
## Training Details
|
| 54 |
+
|
| 55 |
+
### Training Data
|
| 56 |
+
|
| 57 |
+
**Dataset:** `dtp-fine-tuning/dtp-multiturn-interview-valid-15k`
|
| 58 |
+
- **Split:** Train (90%) / Test (10%)
|
| 59 |
+
- **Format:** Multi-turn conversation format.
|
| 60 |
+
- **Max Length:** 2048 tokens
|
| 61 |
+
|
| 62 |
+
### Training Procedure
|
| 63 |
+
|
| 64 |
+
The model was fine-tuned using **Unsloth** on a single NVIDIA A100 (80GB) GPU. It utilizes 4-bit quantization (NF4) to reduce memory usage while maintaining performance via QLoRA.
|
| 65 |
+
|
| 66 |
+
#### Training Hyperparameters
|
| 67 |
+
|
| 68 |
+
- **Training regime:** QLoRA (4-bit quantization with FP16 precision)
|
| 69 |
+
- **Optimizer:** AdamW 8-bit
|
| 70 |
+
- **Learning Rate:** $$2 \times 10^{-5}$$
|
| 71 |
+
- **Scheduler:** Linear with 5% warmup
|
| 72 |
+
- **Batch Size:** 8 per device (Gradient Accumulation: 4)
|
| 73 |
+
- **Epochs:** 2
|
| 74 |
+
- **LoRA Config:**
|
| 75 |
+
- Rank ($$r$$): 16
|
| 76 |
+
- Alpha ($$\alpha$$): 32
|
| 77 |
+
- Dropout: 0.05
|
| 78 |
+
- Target Modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
|
| 79 |
+
|
| 80 |
+
#### Hardware
|
| 81 |
+
- **GPU:** NVIDIA A100 80GB PCIe
|
| 82 |
+
- **VRAM Usage:** Peak allocation approx. 19GB (23% utilization) due to 4-bit loading.
|
| 83 |
+
|
| 84 |
+
## Evaluation
|
| 85 |
+
|
| 86 |
+
### Results
|
| 87 |
+
|
| 88 |
+
The model demonstrates strong convergence on the multi-turn dataset.
|
| 89 |
+
- **Final Train Loss:** $$\approx 0.42$$
|
| 90 |
+
- **Final Eval Loss:** $$\approx 0.41$$
|
| 91 |
+
|
| 92 |
+
*Note: The model outperforms the standard Qwen3-8B baseline on this specific Indonesian dataset, achieving lower loss values faster.*
|
| 93 |
+
|
| 94 |
+
## Environmental Impact
|
| 95 |
+
|
| 96 |
+
- **Hardware Type:** NVIDIA A100 80GB
|
| 97 |
+
- **Compute Region:** asia-east1
|
| 98 |
+
- **Carbon Emitted:** 0.31
|
| 99 |
+
|
| 100 |
+
## Framework Versions
|
| 101 |
+
|
| 102 |
+
- Unsloth
|
| 103 |
+
- PEFT
|
| 104 |
+
- Transformers
|
| 105 |
+
- TRL
|