Instructions to use aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification with PEFT:
```
Task type is invalid.
```

How to use aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification")
model = AutoModelForMultimodalLM.from_pretrained("aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification

SGLang

How to use aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification",
    max_seq_length=2048,
)

Docker Model Runner
How to use aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification with Docker Model Runner:
```
docker model run hf.co/aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification
```

KomdigiITS-8B-DFK
Multimodal Classification

Ministral-3-8B-Base-2512 · LoRA · Vision-Language

01 Overview

A LoRA adapter fine-tuned on aitf-komdigi/KomdigiITS-8B-DFK-CPT (Ministral-3-8B-Base-2512 based) as a Vision-Language Model for multimodal content classification. The model analyzes social media screenshots and classifies them into four categories: netral, disinformasi, fitnah, and ujaran kebencian.

Trained using the SITA framework with Unsloth's SFT pipeline. Given an image, the model produces a structured analysis with a classification label and a detailed Indonesian-language reasoning of any violations found.

♦ Note: This is the final checkpoint from Workshop 3 (final-ministral-8b-cpt-ws3), trained on the DFK VLM Dataset V3 with augmented train/val splits. The base model (aitf-komdigi/KomdigiITS-8B-DFK-CPT) was continual-pretrained on DFK domain-oriented text before fine-tuning.

02 Model Details

Identity

DevelopedDFK Tim 3 ITS

TypeVLM — LoRA adapter

LanguageIndonesian

Architecture

BaseKomdigiITS-8B-DFK-CPT

ArchMistral3ForConditionalGeneration

Params8B (base)

Precisionfloat16

03 Uses

Direct Use

Image-based content moderation classification for Indonesian social media. Given a screenshot, the model produces a structured analysis with a classification label (netral, disinformasi, fitnah, or ujaran kebencian) and a detailed reasoning in Indonesian.

Out-of-Scope Use

This model is not intended for general-purpose vision-language tasks. It is specialized for the DFK disinformation detection pipeline and should not be used for content moderation in other languages or domains without further fine-tuning.

04 Evaluation

Evaluated on the held-out validation split using greedy decoding (temperature=0.0) and BERTScore (bert-base-multilingual-cased).

94.3

Accuracy

91.6

F1 Macro

94.3

F1 Weighted

80.2

BERTScore F1

Per-Class Breakdown

NetralP 0.937 · R 0.973 · F1 0.954 · n=970

Ujrn KbnciP 0.979 · R 0.960 · F1 0.969 · n=867

DisinfoP 0.946 · R 0.895 · F1 0.920 · n=392

FitnahP 0.822 · R 0.822 · F1 0.822 · n=213

Generation Quality Metrics

BERTScore · bert-base-multilingual-cased

Precision0.804

Recall0.801

F10.802

ROUGE-L · n-gram overlap

Precision0.400

Recall0.387

F10.387

05 Training Details

Training Data

Datasetdfk_vlm_dataset_v3 (augmented on fitnah class)

SplitsFixed (train_aug.csv / val_aug.csv)

Train14,293 samples

Val2,831 samples

Label Classes

NetralFactual content or non-DFK material — no violation detected

DisinfoClaims that contradict established facts, not directed at a specific person

FitnahFalse claims directed at a specific individual (defamation)

Ujrn KbnciHate speech targeting ethnicity, religion, race, or intergroup identity (SARA)

Dataset Distribution

Train (augmented) · 14,293 total

Netral3,883 (27.2%)

Fitnah3,846 (26.9%)

Ujrn Kbnci3,484 (24.4%)

Disinfo3,080 (21.6%)

Val (augmented) · 2,831 total

Netral970 (34.3%)

Ujrn Kbnci867 (30.6%)

Disinfo765 (27.0%)

Fitnah229 (8.1%)

Configuration

LoRA Configuration

r16

Alpha16

Dropout0.1

Targetsall-linear

Vision✓ finetuned

Language✓ finetuned

Attention✓ finetuned

MLP✓ finetuned

Hyperparameters

Epochs3

Batch16 (4 × 4 accum)

LR5e-4

OptimizerAdamW 8-bit

Max len4096

Grad norm1

Warmup0.03

Grad ckptunsloth

Seed3407

Trainer

Typeunsloth_vlm_sft (Unsloth VLM SFT trainer)

Train onResponses only

Instr part[INST]

Resp part[/INST]

Best modelSelected by eval_loss (lower is better)

Prompt Template

Each sample is formatted as a multi-turn conversation using the ministral_3 chat template. The dataset builds structured content blocks which the Jinja template renders as:

<s>[SYSTEM_PROMPT]...default Ministral system prompt...[/SYSTEM_PROMPT][INST]Anda adalah seorang analis konten media sosial ahli. Diberikan tangkapan layar dari sebuah konten, tentukan label kategori pelanggaran dan berikan analisis detail mengenai pelanggaran yang ditemukan.Ringkasan: {ringkasan}
Klaim: {klaim}
Fakta: {fakta}[IMG][/INST]Label: {label}

Analisis: {analisis}</s>

Input Fields

RingkasanContent summary. In the RAG pipeline this is the concatenation of the image caption (from a captioning model) and any user-provided text (e.g. post caption, tweet text). Effectively holds all available textual context about the content.

KlaimThe core claim extracted from the content, used as a web search query for fact-checking. Generated by an LLM from the ringkasan. Can also be a direct caption or user-provided text in simpler setups.

FaktaVerification context retrieved via web search. Contains numbered search results with titles, descriptions, and source URLs. If no relevant sources are found, defaults to "Tidak ditemukan sumber yang valid."

[IMG]Screenshot of the social media post being analyzed.

Output Fields

LabelOne of netral, disinformasi, fitnah, or ujaran kebencian.

AnalisisFree-form Indonesian-language explanation of why the content was assigned its label, referencing the image, context, and any retrieved facts.

Full Training Config

experiment_name: final-ministral-8b-cpt-ws3 seed: 3407 reporting: wandb: true wandb_project: "DFK3" model: name: unsloth_vlm pretrained: aitf-komdigi/KomdigiITS-8B-DFK-CPT kwargs: load_in_4bit: false chat_template: "sita/templates/ministral_3.jinja" adapter: name: unsloth_vlm_lora kwargs: finetune_vision_layers: true finetune_language_layers: true finetune_attention_modules: true finetune_mlp_modules: true r: 16 lora_alpha: 16 lora_dropout: 0.1 bias: "none" target_modules: "all-linear" use_gradient_checkpointing: "unsloth" random_state: 3407 dataset: name: dfk_vlm_dataset_v3 kwargs: data_dir: /content/dataset/images/images training: num_epochs: 3 batch_size: 4 learning_rate: 5e-4 gradient_accumulation_steps: 4 max_grad_norm: 1 warmup_ratio: 0.03 weight_decay: 0 logging_steps: 1 eval_steps: 250 extra: seed: 3407 max_length: 4096 load_best_model_at_end: true metric_for_best_model: eval_loss greater_is_better: false trainer: name: unsloth_vlm_sft kwargs: train_on_responses_only: true instruction_part: "[INST]" response_part: "[/INST]" optim: adamw_8bit

evaluation: name: vlm_gen kwargs: max_new_tokens: 512 temperature: 0.0 bert_model: bert-base-multilingual-cased batch_size: 16 num_workers: 11

06 Model Sources

FrameworkSITA

W&B RunDFK3 / final-ministral-8b-cpt-ws3

07 Framework Versions

TRL0.24.0

Transformers5.5.0

PyTorch2.11.0+cu128

Datasets4.3.0

PEFT0.19.0

Tokenizers0.22.2

Downloads last month: -

Model tree for aitf-komdigi/KomdigiITS-8B-DFK-MultimodalClassification

Base model

mistralai/Ministral-3-8B-Base-2512

Adapter

aitf-komdigi/KomdigiITS-8B-DFK-CPT