flystral / README.md

Fix year: 2025 -> 2026

697c8cb verified 17 days ago

3.73 kB

	---
	library_name: transformers
	tags:
	- lora
	- peft
	- drone
	- telemetry
	- vision
	- mistral
	- ministral
	base_model: mistralai/Ministral-3-3B-Instruct-2512-BF16
	license: apache-2.0
	pipeline_tag: image-text-to-text
	---

	# Flystral — LoRA Fine-tuned Ministral 3B for Drone Flight Control

	LoRA adapter for real-time drone telemetry prediction from camera images, built for the [Louise AI Safety Drone Escort](https://github.com/BenBarr/louise) system.

	## What it does

	Given a drone camera frame, the model outputs a telemetry vector (velocity, orientation, altitude adjustments) that drives autonomous flight control. This enables the drone to react to visual obstacles and environmental conditions in real-time during pedestrian escort missions.

	## Training

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base model \| `mistralai/Ministral-3-3B-Instruct-2512-BF16` \|
	\| Method \| LoRA (PEFT) \|
	\| LoRA rank (r) \| 4 \|
	\| LoRA alpha \| 8 \|
	\| Target modules \| `q_proj`, `v_proj` \|
	\| Task type \| CAUSAL_LM \|
	\| Steps \| 500 \|
	\| Learning rate \| 2e-4 \|
	\| Gradient accumulation \| 8 \|
	\| Grad clipping \| 0.3 \|
	\| Precision \| bfloat16 \|
	\| Hardware \| Google Colab T4 GPU (15 GB VRAM) \|
	\| Training time \| ~35 minutes \|
	\| PEFT version \| 0.18.1 \|

	### Dataset

	[AirSim RGB+Depth Drone Flight 10K](https://www.kaggle.com/datasets/lukpellant/droneflight-obs-avoidanceairsimrgbdepth10k-320x320) — 1,000 RGB frames (320×320) from Microsoft AirSim simulator, each paired with a numpy telemetry array containing velocity/orientation data.

	Each training example pairs a drone camera image with a telemetry vector (50 float values) representing the drone's state. The model learns to predict these vectors from visual input.

	### Training loss

	```
	Step 64/500 loss=10.6414
	Step 128/500 loss=9.5537
	Step 192/500 loss=7.0885
	Step 256/500 loss=4.6498
	Step 320/500 loss=3.1225
	Step 384/500 loss=2.4410
	Step 448/500 loss=1.9873
	Step 500/500 loss=1.7251
	```

	Loss decreased from 10.6 → 1.7 over 500 steps, confirming the adapter learned to map visual features to telemetry predictions.

	## Usage

	```python
	import torch
	from transformers import AutoProcessor, Mistral3ForConditionalGeneration
	from peft import PeftModel
	from PIL import Image

	processor = AutoProcessor.from_pretrained("mistralai/Ministral-3-3B-Instruct-2512-BF16")
	model = Mistral3ForConditionalGeneration.from_pretrained(
	"mistralai/Ministral-3-3B-Instruct-2512-BF16",
	torch_dtype=torch.bfloat16,
	)
	model = PeftModel.from_pretrained(model, "BenBarr/flystral")
	model = model.merge_and_unload().cuda().eval()

	img = Image.open("drone_frame.jpg").convert("RGB")

	messages = [{"role": "user", "content": [
	{"type": "image"},
	{"type": "text", "text": "Output the raw telemetry for this frame."},
	]}]

	text = processor.apply_chat_template(messages, add_generation_prompt=True)
	inputs = processor(text=text, images=[img], return_tensors="pt").to("cuda")

	with torch.no_grad():
	output_ids = model.generate(**inputs, max_new_tokens=200, do_sample=False)

	result = processor.decode(output_ids[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
	print(result) # Telemetry vector: vx, vy, vz, yaw_rate, ...
	```

	## Architecture

	The adapter sits in the Louise multi-agent drone escort system:

	- Flystral (this model) — flight control from camera images
	- Helpstral — safety/threat assessment from camera images (Pixtral 12B)
	- Louise — conversational safety companion (Ministral 3B)

	When the fine-tuned endpoint is available, Flystral uses this adapter. When offline, it falls back to agentic mode on the base Ministral 3B via the Mistral API with function calling.

	## Developed by

	Ben Barrett — Mistral Worldwide Hackathon 2026