image

KomdigiITS-8B-DFK
Multimodal Classification

Ministral-3-8B-Base-2512 · LoRA · Vision-Language
01 Overview

A LoRA adapter fine-tuned on aitf-komdigi/KomdigiITS-8B-DFK-CPT (Ministral-3-8B-Base-2512 based) as a Vision-Language Model for multimodal content classification. The model analyzes social media screenshots and classifies them into four categories: netral, disinformasi, fitnah, and ujaran kebencian.

Trained using the SITA framework with Unsloth's SFT pipeline. Given an image, the model produces a structured analysis with a classification label and a detailed Indonesian-language reasoning of any violations found.

♦ Note: This is the final checkpoint from Workshop 3 (final-ministral-8b-cpt-ws3), trained on the DFK VLM Dataset V3 with augmented train/val splits. The base model (aitf-komdigi/KomdigiITS-8B-DFK-CPT) was continual-pretrained on DFK domain-oriented text before fine-tuning.
02 Model Details
Identity
DevelopedDFK Tim 3 ITS
TypeVLM — LoRA adapter
LanguageIndonesian
Architecture
ArchMistral3ForConditionalGeneration
Params8B (base)
Precisionfloat16
03 Uses

Direct Use

Image-based content moderation classification for Indonesian social media. Given a screenshot, the model produces a structured analysis with a classification label (netral, disinformasi, fitnah, or ujaran kebencian) and a detailed reasoning in Indonesian.

Out-of-Scope Use

This model is not intended for general-purpose vision-language tasks. It is specialized for the DFK disinformation detection pipeline and should not be used for content moderation in other languages or domains without further fine-tuning.
04 Evaluation

Evaluated on the held-out validation split using greedy decoding (temperature=0.0) and BERTScore (bert-base-multilingual-cased).

94.3
Accuracy
91.6
F1 Macro
94.3
F1 Weighted
80.2
BERTScore F1
Per-Class Breakdown
NetralP 0.937 · R 0.973 · F1 0.954 · n=970
Ujrn KbnciP 0.979 · R 0.960 · F1 0.969 · n=867
DisinfoP 0.946 · R 0.895 · F1 0.920 · n=392
FitnahP 0.822 · R 0.822 · F1 0.822 · n=213
Generation Quality Metrics
BERTScore · bert-base-multilingual-cased
Precision0.804
Recall0.801
F10.802
ROUGE-L · n-gram overlap
Precision0.400
Recall0.387
F10.387
05 Training Details

Training Data

Datasetdfk_vlm_dataset_v3 (augmented on fitnah class)
SplitsFixed (train_aug.csv / val_aug.csv)
Train14,293 samples
Val2,831 samples

Label Classes

NetralFactual content or non-DFK material — no violation detected
DisinfoClaims that contradict established facts, not directed at a specific person
FitnahFalse claims directed at a specific individual (defamation)
Ujrn KbnciHate speech targeting ethnicity, religion, race, or intergroup identity (SARA)
Dataset Distribution
Train (augmented) · 14,293 total
Netral3,883 (27.2%)
Fitnah3,846 (26.9%)
Ujrn Kbnci3,484 (24.4%)
Disinfo3,080 (21.6%)
Val (augmented) · 2,831 total
Netral970 (34.3%)
Ujrn Kbnci867 (30.6%)
Disinfo765 (27.0%)
Fitnah229 (8.1%)

Configuration

LoRA Configuration
r16
Alpha16
Dropout0.1
Targetsall-linear
Vision✓ finetuned
Language✓ finetuned
Attention✓ finetuned
MLP✓ finetuned
Hyperparameters
Epochs3
Batch16 (4 × 4 accum)
LR5e-4
OptimizerAdamW 8-bit
Max len4096
Grad norm1
Warmup0.03
Grad ckptunsloth
Seed3407

Trainer

Typeunsloth_vlm_sft (Unsloth VLM SFT trainer)
Train onResponses only
Instr part[INST]
Resp part[/INST]
Best modelSelected by eval_loss (lower is better)
Prompt Template

Each sample is formatted as a multi-turn conversation using the ministral_3 chat template. The dataset builds structured content blocks which the Jinja template renders as:

<s>[SYSTEM_PROMPT]...default Ministral system prompt...[/SYSTEM_PROMPT][INST]Anda adalah seorang analis konten media sosial ahli. Diberikan tangkapan layar dari sebuah konten, tentukan label kategori pelanggaran dan berikan analisis detail mengenai pelanggaran yang ditemukan.Ringkasan: {ringkasan}
Klaim: {klaim}
Fakta: {fakta}[IMG][/INST]Label: {label}

Analisis: {analisis}</s>

Input Fields

RingkasanContent summary. In the RAG pipeline this is the concatenation of the image caption (from a captioning model) and any user-provided text (e.g. post caption, tweet text). Effectively holds all available textual context about the content.
KlaimThe core claim extracted from the content, used as a web search query for fact-checking. Generated by an LLM from the ringkasan. Can also be a direct caption or user-provided text in simpler setups.
FaktaVerification context retrieved via web search. Contains numbered search results with titles, descriptions, and source URLs. If no relevant sources are found, defaults to "Tidak ditemukan sumber yang valid."
[IMG]Screenshot of the social media post being analyzed.

Output Fields

LabelOne of netral, disinformasi, fitnah, or ujaran kebencian.
AnalisisFree-form Indonesian-language explanation of why the content was assigned its label, referencing the image, context, and any retrieved facts.
Full Training Config
experiment_name: final-ministral-8b-cpt-ws3
seed: 3407

reporting: wandb: true wandb_project: "DFK3"

model: name: unsloth_vlm pretrained: aitf-komdigi/KomdigiITS-8B-DFK-CPT kwargs: load_in_4bit: false chat_template: "sita/templates/ministral_3.jinja"

adapter: name: unsloth_vlm_lora kwargs: finetune_vision_layers: true finetune_language_layers: true finetune_attention_modules: true finetune_mlp_modules: true r: 16 lora_alpha: 16 lora_dropout: 0.1 bias: "none" target_modules: "all-linear" use_gradient_checkpointing: "unsloth" random_state: 3407

dataset: name: dfk_vlm_dataset_v3 kwargs: data_dir: /content/dataset/images/images

training: num_epochs: 3 batch_size: 4 learning_rate: 5e-4 gradient_accumulation_steps: 4 max_grad_norm: 1 warmup_ratio: 0.03 weight_decay: 0 logging_steps: 1 eval_steps: 250 extra: seed: 3407 max_length: 4096 load_best_model_at_end: true metric_for_best_model: eval_loss greater_is_better: false

trainer: name: unsloth_vlm_sft kwargs: train_on_responses_only: true instruction_part: "[INST]" response_part: "[/INST]" optim: adamw_8bit

evaluation: name: vlm_gen kwargs: max_new_tokens: 512 temperature: 0.0 bert_model: bert-base-multilingual-cased batch_size: 16 num_workers: 11

06 Model Sources
07 Framework Versions
TRL0.24.0
Transformers5.5.0
PyTorch2.11.0+cu128
Datasets4.3.0
PEFT0.19.0
Tokenizers0.22.2
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification