flystral / README.md
BenBarr's picture
Fix year: 2025 -> 2026
697c8cb verified
---
library_name: transformers
tags:
- lora
- peft
- drone
- telemetry
- vision
- mistral
- ministral
base_model: mistralai/Ministral-3-3B-Instruct-2512-BF16
license: apache-2.0
pipeline_tag: image-text-to-text
---
# Flystral β€” LoRA Fine-tuned Ministral 3B for Drone Flight Control
LoRA adapter for real-time drone telemetry prediction from camera images, built for the [Louise AI Safety Drone Escort](https://github.com/BenBarr/louise) system.
## What it does
Given a drone camera frame, the model outputs a telemetry vector (velocity, orientation, altitude adjustments) that drives autonomous flight control. This enables the drone to react to visual obstacles and environmental conditions in real-time during pedestrian escort missions.
## Training
| Parameter | Value |
|-----------|-------|
| Base model | `mistralai/Ministral-3-3B-Instruct-2512-BF16` |
| Method | LoRA (PEFT) |
| LoRA rank (r) | 4 |
| LoRA alpha | 8 |
| Target modules | `q_proj`, `v_proj` |
| Task type | CAUSAL_LM |
| Steps | 500 |
| Learning rate | 2e-4 |
| Gradient accumulation | 8 |
| Grad clipping | 0.3 |
| Precision | bfloat16 |
| Hardware | Google Colab T4 GPU (15 GB VRAM) |
| Training time | ~35 minutes |
| PEFT version | 0.18.1 |
### Dataset
[AirSim RGB+Depth Drone Flight 10K](https://www.kaggle.com/datasets/lukpellant/droneflight-obs-avoidanceairsimrgbdepth10k-320x320) β€” 1,000 RGB frames (320Γ—320) from Microsoft AirSim simulator, each paired with a numpy telemetry array containing velocity/orientation data.
Each training example pairs a drone camera image with a telemetry vector (50 float values) representing the drone's state. The model learns to predict these vectors from visual input.
### Training loss
```
Step 64/500 loss=10.6414
Step 128/500 loss=9.5537
Step 192/500 loss=7.0885
Step 256/500 loss=4.6498
Step 320/500 loss=3.1225
Step 384/500 loss=2.4410
Step 448/500 loss=1.9873
Step 500/500 loss=1.7251
```
Loss decreased from 10.6 β†’ 1.7 over 500 steps, confirming the adapter learned to map visual features to telemetry predictions.
## Usage
```python
import torch
from transformers import AutoProcessor, Mistral3ForConditionalGeneration
from peft import PeftModel
from PIL import Image
processor = AutoProcessor.from_pretrained("mistralai/Ministral-3-3B-Instruct-2512-BF16")
model = Mistral3ForConditionalGeneration.from_pretrained(
"mistralai/Ministral-3-3B-Instruct-2512-BF16",
torch_dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(model, "BenBarr/flystral")
model = model.merge_and_unload().cuda().eval()
img = Image.open("drone_frame.jpg").convert("RGB")
messages = [{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "Output the raw telemetry for this frame."},
]}]
text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=[img], return_tensors="pt").to("cuda")
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=200, do_sample=False)
result = processor.decode(output_ids[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(result) # Telemetry vector: vx, vy, vz, yaw_rate, ...
```
## Architecture
The adapter sits in the Louise multi-agent drone escort system:
- **Flystral** (this model) β€” flight control from camera images
- **Helpstral** β€” safety/threat assessment from camera images (Pixtral 12B)
- **Louise** β€” conversational safety companion (Ministral 3B)
When the fine-tuned endpoint is available, Flystral uses this adapter. When offline, it falls back to agentic mode on the base Ministral 3B via the Mistral API with function calling.
## Developed by
Ben Barrett β€” Mistral Worldwide Hackathon 2026