YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

🧡 3LC Cotton Image Detection Competition End-to-End YOLOv8 + Data Engineering Pipeline

This repository contains my full solution pipeline for the 3LC Cotton Image Detection Competition, covering everything from data engineering to model training, feature extraction, visualization, and final YOLOv8 inference.

The project implements a clean, research-driven computer vision workflow with reproducible steps, optimized preprocessing, and a well-structured training loop ideal for agricultural image analysis.

πŸš€ Project Overview

Cotton quality detection requires robust recognition of cotton conditions across thousands of images under varying lighting, angles, and environmental noise. This repo provides:

  • Full preprocessing & dataset engineering

  • Automated CSV generation for training/validation splits

  • Visualizations of the cotton dataset

  • Feature engineering for metadata-based models

  • YOLOv8 training pipeline for object detection

  • Transformers / ML pipeline exploration

  • Final trained weights & evaluation results

πŸ“‚ Repository Structure

β”œβ”€β”€ CottonDetectionCompetition.ipynb   # Main pipeline notebook
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                           # Original dataset
β”‚   β”œβ”€β”€ processed/                     # Engineered CSVs and cleaned images
β”‚   └── splits/                        # Train/val/test CSVs
β”œβ”€β”€ models/
β”‚   └── yolov8/
β”‚       β”œβ”€β”€ runs/                      # Training logs
β”‚       └── best.pt                    # Best YOLOv8 weights
β”œβ”€β”€ visualization/
β”‚   └── samples/                       # Rendered detection outputs
└── README.md

Technologies & Tools Used

  • Computer Vision

  • Ultralytics YOLOv8 β€” object detection backbone

  • OpenCV β€” image transformations and preprocessing

  • Matplotlib / Seaborn β€” data visualization

  • Albumentations β€” augmentation pipeline

Machine Learning

  • Feature engineering for structured data

  • Transformer-based experimentation for metadata

  • Classical ML models for comparison

  • Data Engineering

  • Automated CSV generation

Preprocessing pipeline for consistent annotations

  • Dataset validation and exploratory data analysis (EDA)

πŸ“ Pipeline Breakdown

  1. Dataset Engineering
  • Loaded raw cotton images

  • Validated annotation formats

  • Generated train.csv, val.csv, test.csv splits

  • Created feature metadata for additional ML experiments

  1. Visualization & EDA
  • Image distribution analysis

  • Cotton brightness / color distribution

  • Label frequency plots

  • Sample montages for inspection

  1. Feature Engineering
  • Implemented structured data features such as:

  • Image dimensions

  • Color histograms

  • Texture descriptors

  • Derived statistical features

Used for ML experiments with:

  • Logistic Regression

  • Random Forest

  • XGBoost

  • Transformer models

  1. YOLOv8 Model Training

The core of the competition solution:

  • Custom YAML configuration

  • Training from scratch & transfer learning

  • Hyperparameter tuning

  • Checkpointing

  • Validation curve visualization

  1. Final Detection Pipeline

Loaded best YOLOv8 model (best.pt)

Generated bounding boxes and confidence scores

Rendered final output images

Prepared predictions for competition submission

πŸ† Results

Successfully trained YOLOv8 detector on cotton dataset

Achieved strong accuracy using optimized preprocessing

Reproducible training environment through notebook pipeline

▢️ How to Run

Install dependencies:

pip install ultralytics opencv-python numpy matplotlib pandas seaborn albumentations
  • Place dataset inside data/raw/.

Open the notebook:

CottonDetectionCompetition.ipynb

Run each section in order β€” the entire pipeline is self-contained.

πŸ§ͺ Future Improvements

  • Train YOLOv8x or YOLO11 models for stronger accuracy

  • Add more advanced augmentations for robustness

  • Convert pipeline into a Python script + CLI

  • Deploy model as a Streamlit web dashboard

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support