NSER-IBVS: Efficient Self-Supervised Neuro-Analytic Visual Servoing for Real-time Quadrotor Control

ICCV 2025 Workshops - Oral Presentation

Sebastian Mocanu · Sebastian-Ion Nae · Mihai-Eugen Barbu · Marius Leordeanu

Paper arXiv GitHub Website


Model Description

This repository contains pre-trained models for NSER-IBVS (Numerically Stable Efficient Reduced Image-Based Visual Servoing), a self-supervised framework for vision-based quadrotor control.

The system uses a teacher-student architecture where a compact 1.7M parameter student network learns to control a drone by imitating an analytical IBVS teacher, achieving 11x faster inference (540 FPS vs 48 FPS) while maintaining comparable accuracy.

Trajectories teacher vs student from two starting points:

Front View Trajectories Teachev vs Student
Front View Trajectories
Up View Trajectories Teachev vs Student
Up View Trajectories

Intended Use

  • Primary use: Vision-based autonomous drone control for target tracking
  • Target users: Robotics researchers, drone developers, computer vision practitioners
  • Out of scope: Production deployment without proper safety validation

Limitations

  • Trained primarily on vehicle targets. May require fine-tuning for other objects
  • Performance degrades in extreme lighting conditions or heavy occlusion
  • Tested only with Parrot Anafi 4K drone (models and systems outputs piloting commands for Parrot drones).

Models

Teacher Pipeline Models

Model File Parameters Description
YOLOv11 Segmentation (sim) 29_05_best__yolo11n-seg_sim_car_bunker__all.pt 2.84M Vehicle segmentation for simulator
YOLOv11 Segmentation (real) real-yolo-car-full-segmentation.pt 2.84M Vehicle segmentation for real-world
Mask Splitter (sim) mask_splitter-epoch_10-dropout_0-low_x2-and-high_x0_quality_early_stop.pt 1.94M Anterior-posterior mask splitting (simulator)
Mask Splitter (real) mask_splitter-epoch_10-dropout_0-_x2_real_early_stop.pt 1.94M Anterior-posterior mask splitting (real-world)

Student Models

Model File Parameters Description
Student (sim→real) student_model_sim_on_real_world_distribution.pth 1.7M Trained on sim, normalized for real-world
Student (fine-tuned) student_real_pretrained_augX3_80_runs.pth 1.7M Fine-tuned on real-world data

Usage

Installation

git clone --recursive https://github.com/SpaceTime-Vision-Robotics-Laboratory/nser-ibvs-drone.git
cd nser-ibvs-drone

python3 -m venv ./venv
source venv/bin/activate

python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -e .

Run Tests to verify imports and functionality:

python -m unittest discover ./tests

Loading Models

YOLO Segmentation:

from mask_splitter.yolo_model import YoloSegmentation

# Initialize YOLO model
yolo = YoloSegmentation(
    model_path="real-yolo-car-full-segmentation.pt",
    confidence_threshold=0.7
)

# Segment an image
annotated_frame, binary_mask = yolo.segment_image(frame)

# Get detection info
results = yolo.detect(frame)
target = yolo.find_best_target_box(results)
print(f"Confidence: {target.confidence}, Center: {target.center}")

Mask-Splitter Network

import cv2
from mask_splitter.nn.infer import MaskSplitterInference

# Initialize the model
splitter = MaskSplitterInference(
    model_path="mask_splitter-epoch_10-dropout_0-_x2_real_early_stop.pt",
    device="cuda",
    image_size=(360, 640),
    confidence_threshold=0.5
)

# Load image and mask
image = cv2.imread("frame-path.png")
mask = cv2.imread("mask-path.png", cv2.IMREAD_GRAYSCALE)

# Run inference
front_mask, back_mask = splitter.infer(image, mask)

# Visualize results
splitter.visualize(image, front_mask, back_mask)

Student Network

Direct usage:

import torch
from nser_ibvs_drone.distiled_network.drone_command_regressor import DroneCommandRegressor

# Load model
model = DroneCommandRegressor()
model.load_model("student_model_sim_on_real_world_distribution.pth")
model.eval()

# Input: RGB image tensor [B, 3, H, W]
# Output: velocity commands [vx, vy, vyaw]

Or with Student Engine:

import cv2
from nser_ibvs_drone.distiled_network.distil_engine import StudentEngine

student_model_path = "student_model_sim_on_real_world_distribution.pth"
model_engine = StudentEngine(student_model_path)

frame = cv2.imread("frame-path.png")
commands = model_engine.predict(frame)

For complete inference pipelines and running on algorithms on drones, see the docs in GitHub repository.


Performance

Metric Teacher (NSER-IBVS) Student Network
Inference Speed 48.3 FPS 540.8 FPS
Parameters 4.78M 1.7M
Mean Error (Sim) 29.76 px 14.26 px
IoU (Sim) 0.522 0.752
Mean Error (Real) 29.96 px 33.33 px
IoU (Real) 0.627 0.591

Control commands and error evolutions over time:

Real-World Flight - Teacher (IBVS)
Real-World Flight - Teacher (IBVS)
Real-World Flight - Student
Real-World Flight - Student
Digital-Twin Flight - Teacher (IBVS)
Digital-Twin Flight - Teacher (IBVS)
Digital-Twin Flight - Student
Digital-Twin Flight - Student

Training Data

Training data and the custom UE4 simulator environment are available:


Citation

@InProceedings{Mocanu_2025_ICCV,
    author    = {Mocanu, Sebastian and Nae, Sebastian-Ion and Barbu, Mihai-Eugen and Leordeanu, Marius},
    title     = {Efficient Self-Supervised Neuro-Analytic Visual Servoing for Real-time Quadrotor Control},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    month     = {October},
    year      = {2025},
    pages     = {1744-1753}
}

License

Academic Free License v3.0


Links

Resource Link
Paper ICCV 2025 Open Access
arXiv 2507.19878
Code GitHub
Website Project Page
Poster PDF
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Space using brittleru/nser-ibvs-drone 1

Collection including brittleru/nser-ibvs-drone

Paper for brittleru/nser-ibvs-drone

Evaluation results