iamcode6's picture
Link to live Gradio demo Space
2ffc817
---
license: apache-2.0
library_name: peft
tags:
- llama-3.2-vision
- plant-disease
- lora
- rocm
- mi300x
- amd
- vlm
- image-text-to-text
base_model: meta-llama/Llama-3.2-11B-Vision-Instruct
---
# Llama 3.2 Vision 11B LoRA β€” Plant Disease Diagnosis (MI300X fine-tune)
Fine-tuned **Llama 3.2 Vision 11B** with **LoRA** on a plant disease QA dataset
(cashew, cassava, maize, tomato β€” 22 disease classes) for visual diagnosis and
treatment recommendations.
Trained on a single **AMD Instinct MI300X** using PyTorch + ROCm, as a submission to the
lablab.ai AMD hackathon **Track 3 β€” Building AI-Powered Applications on AMD GPUs**.
## 🌱 Try the live demo
This adapter powers the conversational layer of an interactive Plant Disease Assistant β€” upload a leaf photo to get a diagnosis and treatment guide:
**πŸ‘‰ [huggingface.co/spaces/lablab-ai-amd-developer-hackathon/merolav-space](https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/merolav-space)**
(The hosted Space runs the lighter DINOv2-L classifier on free CPU. Load this LoRA adapter on a GPU machine for the full conversational VLM experience β€” see the code under [`vision/`](https://github.com/genyarko/amd-merolav/tree/main/vision) in the training repo.)
## Results
| Epoch | Train Loss | Val Loss | Throughput | Wall Time |
|------:|-----------:|---------:|:----------:|:---------:|
| 1 | 0.9530 | 0.0244 | 1.3 | 3331.1s |
| 2 | 0.0203 | 0.0177 | 1.3 | 3318.3s |
| 3 | 0.0151 | 0.0151 | 1.3 | 3321.8s |
| 4 | 0.0125 | 0.0147 | 1.3 | 3318.8s |
| 5 | 0.0109 | 0.0146 | 1.3 | 3317.5s |
**Best val_loss: 0.0146**
### Epoch 3 adapter
An `epoch3_adapter/` checkpoint is included for A/B comparison. Epoch 3 had val_loss=0.0151 vs epoch 5's 0.0146 β€” the difference is marginal and epoch 3 may generalize equally well in practice.
## Training Details
- **Base model:** meta-llama/Llama-3.2-11B-Vision-Instruct (11B params, ~4B vision + ~7B language)
- **Method:** LoRA (rank=16, alpha=32, dropout=0.05)
- **Target modules:** q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- **Precision:** bf16 (native MI300X)
- **Epochs:** 5
- **Effective batch size:** 16
- **Learning rate:** 2e-05 with cosine decay + 0.1 warmup
- **Optimizer:** AdamW (weight_decay=0.01)
- **Max sequence length:** 2048
- **Hardware:** 1x AMD Instinct MI300X (192 GB HBM3)
## Usage
```python
from peft import PeftModel
from transformers import AutoProcessor, MllamaForConditionalGeneration
base = MllamaForConditionalGeneration.from_pretrained(
"meta-llama/Llama-3.2-11B-Vision-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "best_adapter")
processor = AutoProcessor.from_pretrained("best_adapter")
```
## Artifacts
- `best_adapter/` β€” LoRA weights from the best validation epoch
- `epoch3_adapter/` β€” LoRA weights from epoch 3 (for A/B comparison)
- `config.yaml` β€” training hyperparameters
- `metrics.json` β€” per-epoch training history
See `config.yaml` for the full hyperparameter set.
## Source
Training code: <https://github.com/genyarko/amd-merolav/tree/main/vision>