Update README.md

04511e2 verified about 1 month ago

5.59 kB

	---
	base_model: ismaprasetiyadi/Biawak-8B-Base
	library_name: peft
	pipeline_tag: text-generation
	language:
	- id
	tags:
	- base_model:adapter:ismaprasetiyadi/Biawak-8B-Base
	- lora
	- sft
	- transformers
	- trl
	- unsloth
	- biawak
	- indonesian
	- instruction-following
	license: apache-2.0
	datasets:
	- dtp-fine-tuning/dtp-singleturn-AGQ-9k
	---

	# Model Card for SFT-Biawak-8B-AGQ-9k-Unsloth
	## Model Details

	### Model Description

	This model is a fine-tuned version of [ismaprasetiyadi/Biawak-8B-Base](https://huggingface.co/ismaprasetiyadi/Biawak-8B-Base). It was trained using Unsloth and LoRA (Low-Rank Adaptation) on the [dtp-singleturn-AGQ-9k](https://huggingface.co/datasets/dtp-fine-tuning/dtp-singleturn-AGQ-9k) dataset.

	The model is specifically optimized for Indonesian single-turn instruction following, utilizing the Qwen3 chat template structure. It leverages 4-bit quantization for memory efficiency during training and inference.

	- Developed by: DTP2 Team
	- Model type: Causal Language Model (LoRA Adapter)
	- Language(s) (NLP): Indonesian (id)
	- License: Apache-2.0
	- Finetuned from model: [ismaprasetiyadi/Biawak-8B-Base](https://huggingface.co/ismaprasetiyadi/Biawak-8B-Base)

	### Model Sources

	- Repository: [dtp-fine-tuning/DTP_AGQ_Question_Diploy_9K](https://huggingface.co/dtp-fine-tuning/DTP_AGQ_Question_Diploy_9K)
	- Dataset: [dtp-fine-tuning/dtp-singleturn-AGQ-9k](https://huggingface.co/datasets/dtp-fine-tuning/dtp-singleturn-AGQ-9k)
	- Training Logs: [View full W&B Report](https://api.wandb.ai/links/DTP2/zh541s0e)

	## Uses

	### Direct Use

	The model is designed for Indonesian chat and instruction-following tasks. It performs best in single-turn question-answering scenarios involving general knowledge, reasoning, and cultural context provided by the AGQ dataset.

	### Out-of-Scope Use

	- Long-context conversations: The model was fine-tuned on single-turn data; multi-turn performance may be inconsistent.
	- High-stakes decision making: As an 8B model, it may hallucinate facts and should not be used for medical or legal advice without verification.

	## Bias, Risks, and Limitations

	This model inherits the biases present in the base `Biawak-8B` model and the `AGQ-9k` dataset. While fine-tuning improves instruction adherence, users should be aware that the model can still generate plausible-sounding but incorrect information.

	### Recommendations

	Users should verify important information generated by the model. It is recommended to use the `qwen3` chat template for optimal performance.

	## How to Get Started with the Model

	Use the code below to load the model and run inference:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	# 1. Load Base Model
	base_model_name = "ismaprasetiyadi/Biawak-8B-Base"
	adapter_model_name = "YOUR_USERNAME/SFT-Biawak-8B-AGQ-9k-Unsloth"

	model = AutoModelForCausalLM.from_pretrained(
	base_model_name,
	torch_dtype=torch.float16,
	device_map="auto",
	trust_remote_code=True
	)
	tokenizer = AutoTokenizer.from_pretrained(adapter_model_name, trust_remote_code=True)

	# 2. Load Adapter
	model = PeftModel.from_pretrained(model, adapter_model_name)

	# 3. Inference
	messages = [
	{"role": "user", "content": "Jelaskan sejarah singkat kemerdekaan Indonesia."}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to("cuda")

	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Training Details

	### Training Data

	The model was trained on dtp-fine-tuning/dtp-singleturn-AGQ-9k.
	- Size: ~9k examples
	- Content: Indonesian general questions and instructions (Single Turn).
	- Split: Train (90%) / Test (10%)

	### Training Procedure

	The model was fine-tuned using the Unsloth library, which provides 2x faster training and ~60% less memory usage compared to standard Hugging Face implementations.


	#### Training Hyperparameters
	- Training regime: bf16 mixed precision (via Unsloth/LoRA)
	- Quantization: 4-bit (nf4)
	- LoRA Rank: 16
	- LoRA Alpha: 32
	- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Batch Size: 8 per device (Effective batch size: 32 via Gradient Accumulation)
	- Learning Rate: 2e-5 (Linear Schedule with 0.05 warmup)
	- Epochs: 2
	- Max Sequence Length: 8192
	- Optimizer: adamw_8bit
	#### Speeds, Sizes, Times [optional]

	- Hardware: 1x NVIDIA A100 80GB
	- Training Duration: ~10 hours
	- GPU Memory Usage: Peaked at ~45GB (56% utilization)

	## Evaluation
	### Results

	The model demonstrated stable convergence over 2 epochs.
	* Final Training Loss: ~0.70
	* Final Validation Loss: ~0.67
	* Observation: The validation loss consistently decreased alongside training loss, indicating no overfitting occurred during the training process.

	[View the full training run plots and metrics on Weights & Biases](https://api.wandb.ai/links/DTP2/zh541s0e)

	## Environmental Impact

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: NVIDIA A100 80GB
	- Hours used: 10 hours
	- Cloud Provider: University Server / Private Infrastructure
	- Compute Region: Indonesia

	### Framework versions
	- Unsloth 2024.x
	- Transformers 4.x
	- Pytorch 2.x
	- Datasets 2.x
	- Tokenizers 0.x