Instructions to use iamcode6/llama32-vision-ccmt-mi300x with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use iamcode6/llama32-vision-ccmt-mi300x with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: peft | |
| tags: | |
| - llama-3.2-vision | |
| - plant-disease | |
| - lora | |
| - rocm | |
| - mi300x | |
| - amd | |
| - vlm | |
| - image-text-to-text | |
| base_model: meta-llama/Llama-3.2-11B-Vision-Instruct | |
| # Llama 3.2 Vision 11B LoRA β Plant Disease Diagnosis (MI300X fine-tune) | |
| Fine-tuned **Llama 3.2 Vision 11B** with **LoRA** on a plant disease QA dataset | |
| (cashew, cassava, maize, tomato β 22 disease classes) for visual diagnosis and | |
| treatment recommendations. | |
| Trained on a single **AMD Instinct MI300X** using PyTorch + ROCm, as a submission to the | |
| lablab.ai AMD hackathon **Track 3 β Building AI-Powered Applications on AMD GPUs**. | |
| ## π± Try the live demo | |
| This adapter powers the conversational layer of an interactive Plant Disease Assistant β upload a leaf photo to get a diagnosis and treatment guide: | |
| **π [huggingface.co/spaces/lablab-ai-amd-developer-hackathon/merolav-space](https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/merolav-space)** | |
| (The hosted Space runs the lighter DINOv2-L classifier on free CPU. Load this LoRA adapter on a GPU machine for the full conversational VLM experience β see the code under [`vision/`](https://github.com/genyarko/amd-merolav/tree/main/vision) in the training repo.) | |
| ## Results | |
| | Epoch | Train Loss | Val Loss | Throughput | Wall Time | | |
| |------:|-----------:|---------:|:----------:|:---------:| | |
| | 1 | 0.9530 | 0.0244 | 1.3 | 3331.1s | | |
| | 2 | 0.0203 | 0.0177 | 1.3 | 3318.3s | | |
| | 3 | 0.0151 | 0.0151 | 1.3 | 3321.8s | | |
| | 4 | 0.0125 | 0.0147 | 1.3 | 3318.8s | | |
| | 5 | 0.0109 | 0.0146 | 1.3 | 3317.5s | | |
| **Best val_loss: 0.0146** | |
| ### Epoch 3 adapter | |
| An `epoch3_adapter/` checkpoint is included for A/B comparison. Epoch 3 had val_loss=0.0151 vs epoch 5's 0.0146 β the difference is marginal and epoch 3 may generalize equally well in practice. | |
| ## Training Details | |
| - **Base model:** meta-llama/Llama-3.2-11B-Vision-Instruct (11B params, ~4B vision + ~7B language) | |
| - **Method:** LoRA (rank=16, alpha=32, dropout=0.05) | |
| - **Target modules:** q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj | |
| - **Precision:** bf16 (native MI300X) | |
| - **Epochs:** 5 | |
| - **Effective batch size:** 16 | |
| - **Learning rate:** 2e-05 with cosine decay + 0.1 warmup | |
| - **Optimizer:** AdamW (weight_decay=0.01) | |
| - **Max sequence length:** 2048 | |
| - **Hardware:** 1x AMD Instinct MI300X (192 GB HBM3) | |
| ## Usage | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoProcessor, MllamaForConditionalGeneration | |
| base = MllamaForConditionalGeneration.from_pretrained( | |
| "meta-llama/Llama-3.2-11B-Vision-Instruct", | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| model = PeftModel.from_pretrained(base, "best_adapter") | |
| processor = AutoProcessor.from_pretrained("best_adapter") | |
| ``` | |
| ## Artifacts | |
| - `best_adapter/` β LoRA weights from the best validation epoch | |
| - `epoch3_adapter/` β LoRA weights from epoch 3 (for A/B comparison) | |
| - `config.yaml` β training hyperparameters | |
| - `metrics.json` β per-epoch training history | |
| See `config.yaml` for the full hyperparameter set. | |
| ## Source | |
| Training code: <https://github.com/genyarko/amd-merolav/tree/main/vision> | |