Llama 3.2 Vision 11B LoRA β€” Plant Disease Diagnosis (MI300X fine-tune)

Fine-tuned Llama 3.2 Vision 11B with LoRA on a plant disease QA dataset (cashew, cassava, maize, tomato β€” 22 disease classes) for visual diagnosis and treatment recommendations.

Trained on a single AMD Instinct MI300X using PyTorch + ROCm, as a submission to the lablab.ai AMD hackathon Track 3 β€” Building AI-Powered Applications on AMD GPUs.

🌱 Try the live demo

This adapter powers the conversational layer of an interactive Plant Disease Assistant β€” upload a leaf photo to get a diagnosis and treatment guide:

πŸ‘‰ huggingface.co/spaces/lablab-ai-amd-developer-hackathon/merolav-space

(The hosted Space runs the lighter DINOv2-L classifier on free CPU. Load this LoRA adapter on a GPU machine for the full conversational VLM experience β€” see the code under vision/ in the training repo.)

Results

Epoch Train Loss Val Loss Throughput Wall Time
1 0.9530 0.0244 1.3 3331.1s
2 0.0203 0.0177 1.3 3318.3s
3 0.0151 0.0151 1.3 3321.8s
4 0.0125 0.0147 1.3 3318.8s
5 0.0109 0.0146 1.3 3317.5s

Best val_loss: 0.0146

Epoch 3 adapter

An epoch3_adapter/ checkpoint is included for A/B comparison. Epoch 3 had val_loss=0.0151 vs epoch 5's 0.0146 β€” the difference is marginal and epoch 3 may generalize equally well in practice.

Training Details

  • Base model: meta-llama/Llama-3.2-11B-Vision-Instruct (11B params, ~4B vision + ~7B language)
  • Method: LoRA (rank=16, alpha=32, dropout=0.05)
  • Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
  • Precision: bf16 (native MI300X)
  • Epochs: 5
  • Effective batch size: 16
  • Learning rate: 2e-05 with cosine decay + 0.1 warmup
  • Optimizer: AdamW (weight_decay=0.01)
  • Max sequence length: 2048
  • Hardware: 1x AMD Instinct MI300X (192 GB HBM3)

Usage

from peft import PeftModel
from transformers import AutoProcessor, MllamaForConditionalGeneration

base = MllamaForConditionalGeneration.from_pretrained(
    "meta-llama/Llama-3.2-11B-Vision-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "best_adapter")
processor = AutoProcessor.from_pretrained("best_adapter")

Artifacts

  • best_adapter/ β€” LoRA weights from the best validation epoch
  • epoch3_adapter/ β€” LoRA weights from epoch 3 (for A/B comparison)
  • config.yaml β€” training hyperparameters
  • metrics.json β€” per-epoch training history

See config.yaml for the full hyperparameter set.

Source

Training code: https://github.com/genyarko/amd-merolav/tree/main/vision

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for iamcode6/llama32-vision-ccmt-mi300x

Adapter
(351)
this model