Llama 3.2 Vision 11B LoRA β Plant Disease Diagnosis (MI300X fine-tune)
Fine-tuned Llama 3.2 Vision 11B with LoRA on a plant disease QA dataset (cashew, cassava, maize, tomato β 22 disease classes) for visual diagnosis and treatment recommendations.
Trained on a single AMD Instinct MI300X using PyTorch + ROCm, as a submission to the lablab.ai AMD hackathon Track 3 β Building AI-Powered Applications on AMD GPUs.
π± Try the live demo
This adapter powers the conversational layer of an interactive Plant Disease Assistant β upload a leaf photo to get a diagnosis and treatment guide:
π huggingface.co/spaces/lablab-ai-amd-developer-hackathon/merolav-space
(The hosted Space runs the lighter DINOv2-L classifier on free CPU. Load this LoRA adapter on a GPU machine for the full conversational VLM experience β see the code under vision/ in the training repo.)
Results
| Epoch | Train Loss | Val Loss | Throughput | Wall Time |
|---|---|---|---|---|
| 1 | 0.9530 | 0.0244 | 1.3 | 3331.1s |
| 2 | 0.0203 | 0.0177 | 1.3 | 3318.3s |
| 3 | 0.0151 | 0.0151 | 1.3 | 3321.8s |
| 4 | 0.0125 | 0.0147 | 1.3 | 3318.8s |
| 5 | 0.0109 | 0.0146 | 1.3 | 3317.5s |
Best val_loss: 0.0146
Epoch 3 adapter
An epoch3_adapter/ checkpoint is included for A/B comparison. Epoch 3 had val_loss=0.0151 vs epoch 5's 0.0146 β the difference is marginal and epoch 3 may generalize equally well in practice.
Training Details
- Base model: meta-llama/Llama-3.2-11B-Vision-Instruct (11B params, ~4B vision + ~7B language)
- Method: LoRA (rank=16, alpha=32, dropout=0.05)
- Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- Precision: bf16 (native MI300X)
- Epochs: 5
- Effective batch size: 16
- Learning rate: 2e-05 with cosine decay + 0.1 warmup
- Optimizer: AdamW (weight_decay=0.01)
- Max sequence length: 2048
- Hardware: 1x AMD Instinct MI300X (192 GB HBM3)
Usage
from peft import PeftModel
from transformers import AutoProcessor, MllamaForConditionalGeneration
base = MllamaForConditionalGeneration.from_pretrained(
"meta-llama/Llama-3.2-11B-Vision-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "best_adapter")
processor = AutoProcessor.from_pretrained("best_adapter")
Artifacts
best_adapter/β LoRA weights from the best validation epochepoch3_adapter/β LoRA weights from epoch 3 (for A/B comparison)config.yamlβ training hyperparametersmetrics.jsonβ per-epoch training history
See config.yaml for the full hyperparameter set.
Source
Training code: https://github.com/genyarko/amd-merolav/tree/main/vision
- Downloads last month
- -
Model tree for iamcode6/llama32-vision-ccmt-mi300x
Base model
meta-llama/Llama-3.2-11B-Vision-Instruct