---
library_name: transformers
tags:
  - lora
  - peft
  - drone
  - telemetry
  - vision
  - mistral
  - ministral
base_model: mistralai/Ministral-3-3B-Instruct-2512-BF16
license: apache-2.0
pipeline_tag: image-text-to-text
---

# Flystral — LoRA Fine-tuned Ministral 3B for Drone Flight Control

LoRA adapter for real-time drone telemetry prediction from camera images, built for the [Louise AI Safety Drone Escort](https://github.com/BenBarr/louise) system.

## What it does

Given a drone camera frame, the model outputs a telemetry vector (velocity, orientation, altitude adjustments) that drives autonomous flight control. This enables the drone to react to visual obstacles and environmental conditions in real-time during pedestrian escort missions.

## Training

| Parameter | Value |
|-----------|-------|
| Base model | `mistralai/Ministral-3-3B-Instruct-2512-BF16` |
| Method | LoRA (PEFT) |
| LoRA rank (r) | 4 |
| LoRA alpha | 8 |
| Target modules | `q_proj`, `v_proj` |
| Task type | CAUSAL_LM |
| Steps | 500 |
| Learning rate | 2e-4 |
| Gradient accumulation | 8 |
| Grad clipping | 0.3 |
| Precision | bfloat16 |
| Hardware | Google Colab T4 GPU (15 GB VRAM) |
| Training time | ~35 minutes |
| PEFT version | 0.18.1 |

### Dataset

[AirSim RGB+Depth Drone Flight 10K](https://www.kaggle.com/datasets/lukpellant/droneflight-obs-avoidanceairsimrgbdepth10k-320x320) — 1,000 RGB frames (320×320) from Microsoft AirSim simulator, each paired with a numpy telemetry array containing velocity/orientation data.

Each training example pairs a drone camera image with a telemetry vector (50 float values) representing the drone's state. The model learns to predict these vectors from visual input.

### Training loss

```
Step  64/500  loss=10.6414
Step 128/500  loss=9.5537
Step 192/500  loss=7.0885
Step 256/500  loss=4.6498
Step 320/500  loss=3.1225
Step 384/500  loss=2.4410
Step 448/500  loss=1.9873
Step 500/500  loss=1.7251
```

Loss decreased from 10.6 → 1.7 over 500 steps, confirming the adapter learned to map visual features to telemetry predictions.

## Usage

```python
import torch
from transformers import AutoProcessor, Mistral3ForConditionalGeneration
from peft import PeftModel
from PIL import Image

processor = AutoProcessor.from_pretrained("mistralai/Ministral-3-3B-Instruct-2512-BF16")
model = Mistral3ForConditionalGeneration.from_pretrained(
    "mistralai/Ministral-3-3B-Instruct-2512-BF16",
    torch_dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(model, "BenBarr/flystral")
model = model.merge_and_unload().cuda().eval()

img = Image.open("drone_frame.jpg").convert("RGB")

messages = [{"role": "user", "content": [
    {"type": "image"},
    {"type": "text", "text": "Output the raw telemetry for this frame."},
]}]

text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=[img], return_tensors="pt").to("cuda")

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=200, do_sample=False)

result = processor.decode(output_ids[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(result)  # Telemetry vector: vx, vy, vz, yaw_rate, ...
```

## Architecture

The adapter sits in the Louise multi-agent drone escort system:

- **Flystral** (this model) — flight control from camera images
- **Helpstral** — safety/threat assessment from camera images (Pixtral 12B)
- **Louise** — conversational safety companion (Ministral 3B)

When the fine-tuned endpoint is available, Flystral uses this adapter. When offline, it falls back to agentic mode on the base Ministral 3B via the Mistral API with function calling.

## Developed by

Ben Barrett — Mistral Worldwide Hackathon 2026