YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
π§΅ 3LC Cotton Image Detection Competition End-to-End YOLOv8 + Data Engineering Pipeline
This repository contains my full solution pipeline for the 3LC Cotton Image Detection Competition, covering everything from data engineering to model training, feature extraction, visualization, and final YOLOv8 inference.
The project implements a clean, research-driven computer vision workflow with reproducible steps, optimized preprocessing, and a well-structured training loop ideal for agricultural image analysis.
π Project Overview
Cotton quality detection requires robust recognition of cotton conditions across thousands of images under varying lighting, angles, and environmental noise. This repo provides:
Full preprocessing & dataset engineering
Automated CSV generation for training/validation splits
Visualizations of the cotton dataset
Feature engineering for metadata-based models
YOLOv8 training pipeline for object detection
Transformers / ML pipeline exploration
Final trained weights & evaluation results
π Repository Structure
βββ CottonDetectionCompetition.ipynb # Main pipeline notebook
βββ data/
β βββ raw/ # Original dataset
β βββ processed/ # Engineered CSVs and cleaned images
β βββ splits/ # Train/val/test CSVs
βββ models/
β βββ yolov8/
β βββ runs/ # Training logs
β βββ best.pt # Best YOLOv8 weights
βββ visualization/
β βββ samples/ # Rendered detection outputs
βββ README.md
Technologies & Tools Used
Computer Vision
Ultralytics YOLOv8 β object detection backbone
OpenCV β image transformations and preprocessing
Matplotlib / Seaborn β data visualization
Albumentations β augmentation pipeline
Machine Learning
Feature engineering for structured data
Transformer-based experimentation for metadata
Classical ML models for comparison
Data Engineering
Automated CSV generation
Preprocessing pipeline for consistent annotations
- Dataset validation and exploratory data analysis (EDA)
π Pipeline Breakdown
- Dataset Engineering
Loaded raw cotton images
Validated annotation formats
Generated train.csv, val.csv, test.csv splits
Created feature metadata for additional ML experiments
- Visualization & EDA
Image distribution analysis
Cotton brightness / color distribution
Label frequency plots
Sample montages for inspection
- Feature Engineering
Implemented structured data features such as:
Image dimensions
Color histograms
Texture descriptors
Derived statistical features
Used for ML experiments with:
Logistic Regression
Random Forest
XGBoost
Transformer models
- YOLOv8 Model Training
The core of the competition solution:
Custom YAML configuration
Training from scratch & transfer learning
Hyperparameter tuning
Checkpointing
Validation curve visualization
- Final Detection Pipeline
Loaded best YOLOv8 model (best.pt)
Generated bounding boxes and confidence scores
Rendered final output images
Prepared predictions for competition submission
π Results
Successfully trained YOLOv8 detector on cotton dataset
Achieved strong accuracy using optimized preprocessing
Reproducible training environment through notebook pipeline
βΆοΈ How to Run
Install dependencies:
pip install ultralytics opencv-python numpy matplotlib pandas seaborn albumentations
- Place dataset inside data/raw/.
Open the notebook:
CottonDetectionCompetition.ipynb
Run each section in order β the entire pipeline is self-contained.
π§ͺ Future Improvements
Train YOLOv8x or YOLO11 models for stronger accuracy
Add more advanced augmentations for robustness
Convert pipeline into a Python script + CLI
Deploy model as a Streamlit web dashboard