πΏ Plant Identification ViT (Fine-Tuned by Kelvin jackson (DRROBOT))
Base Model: marwaALzaabi/plant-identification-vit
Fine-Tuned On: Kaggle β House Plant Species Dataset
Developed By: Kelvin Nnadi
Objective: To build a high-accuracy computer vision model that can identify and describe a wide range of houseplants, forming the perception layer of a larger AI botanist system.
π§ Model Summary
This model is a fine-tuned Vision Transformer (ViT) specialized for plant species recognition.
It was trained on 14,790 high-quality images covering 47 distinct houseplant species, improving the modelβs ability to handle real-world lighting, angles, and background variation.
The model forms the visual foundation of an intelligent AI system that integrates with Qwen Instruct for reasoning, allowing users to snap or upload plant photos and receive detailed botanical explanations.
βοΈ Training Details
| Parameter | Value |
|---|---|
| Base Model | marwaALzaabi/plant-identification-vit |
| Dataset | Kaggle House Plant Species (~14.8k images, 47 classes) |
| Epochs | 5 |
| Batch Size | 16 |
| Optimizer | AdamW |
| Learning Rate | 5e-5 |
| Scheduler | Cosine Annealing |
| Hardware | NVIDIA T4 GPU (Colab Pro+) |
| Mixed Precision | FP16 enabled |
| Framework | Hugging Face Transformers + PyTorch |
π Performance Metrics
| Metric | Value |
|---|---|
| Training Loss (Final) | 0.0010 |
| Validation Loss (Final) | 0.2161 |
| Best Validation Epoch | 5 |
| Global Training Loss | 0.1849 |
| Steps | 8,320 |
| Samples/Sec | 7.75 |
| Steps/Sec | 0.969 |
The model achieved remarkably low loss and stable convergence, indicating excellent generalization to unseen plant images.
π± Intended Use
This model can be used for:
- πΈ Real-time plant species recognition from photos
- πΏ Agricultural or botanical assistant systems (e.g., Farmlingua or AI Botanist)
- π§ Educational tools for plant taxonomy learning
- πͺ΄ Smart garden applications with vision intelligence
It can also be paired with a text-based reasoning model like to provide rich, natural language explanations about plant care, origin, and characteristics.
π§© Model Architecture Type: Vision Transformer (ViT)
Patch Size: 16x16
Embedding Dimension: 768
Heads: 12
Depth: 12
Fine-tuning Method: Full fine-tuning (not LoRA)
βοΈ License This model is released under the Apache 2.0 License, allowing both commercial and research use with attribution.
π¬ Citation If you use this model, please cite:
java Copy code @model{kelvinnnadi_plant_vit_2025, title={Plant Identification ViT (Fine-Tuned)}, author={Kelvin Nnadi}, year={2025}, howpublished={Hugging Face}, url={https://huggingface.co/your-username/plant-identification-vit-finetuned} } π Highlights Fine-tuned with 47 classes of houseplants
Highly generalized on real-world photos
Seamlessly integrates with multimodal LLMs
Production-grade architecture suitable for cloud APIs