| --- |
| library_name: transformers |
| tags: |
| - lora |
| - peft |
| - drone |
| - telemetry |
| - vision |
| - mistral |
| - ministral |
| base_model: mistralai/Ministral-3-3B-Instruct-2512-BF16 |
| license: apache-2.0 |
| pipeline_tag: image-text-to-text |
| --- |
| |
| # Flystral β LoRA Fine-tuned Ministral 3B for Drone Flight Control |
|
|
| LoRA adapter for real-time drone telemetry prediction from camera images, built for the [Louise AI Safety Drone Escort](https://github.com/BenBarr/louise) system. |
|
|
| ## What it does |
|
|
| Given a drone camera frame, the model outputs a telemetry vector (velocity, orientation, altitude adjustments) that drives autonomous flight control. This enables the drone to react to visual obstacles and environmental conditions in real-time during pedestrian escort missions. |
|
|
| ## Training |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | Base model | `mistralai/Ministral-3-3B-Instruct-2512-BF16` | |
| | Method | LoRA (PEFT) | |
| | LoRA rank (r) | 4 | |
| | LoRA alpha | 8 | |
| | Target modules | `q_proj`, `v_proj` | |
| | Task type | CAUSAL_LM | |
| | Steps | 500 | |
| | Learning rate | 2e-4 | |
| | Gradient accumulation | 8 | |
| | Grad clipping | 0.3 | |
| | Precision | bfloat16 | |
| | Hardware | Google Colab T4 GPU (15 GB VRAM) | |
| | Training time | ~35 minutes | |
| | PEFT version | 0.18.1 | |
| |
| ### Dataset |
| |
| [AirSim RGB+Depth Drone Flight 10K](https://www.kaggle.com/datasets/lukpellant/droneflight-obs-avoidanceairsimrgbdepth10k-320x320) β 1,000 RGB frames (320Γ320) from Microsoft AirSim simulator, each paired with a numpy telemetry array containing velocity/orientation data. |
| |
| Each training example pairs a drone camera image with a telemetry vector (50 float values) representing the drone's state. The model learns to predict these vectors from visual input. |
| |
| ### Training loss |
| |
| ``` |
| Step 64/500 loss=10.6414 |
| Step 128/500 loss=9.5537 |
| Step 192/500 loss=7.0885 |
| Step 256/500 loss=4.6498 |
| Step 320/500 loss=3.1225 |
| Step 384/500 loss=2.4410 |
| Step 448/500 loss=1.9873 |
| Step 500/500 loss=1.7251 |
| ``` |
| |
| Loss decreased from 10.6 β 1.7 over 500 steps, confirming the adapter learned to map visual features to telemetry predictions. |
| |
| ## Usage |
| |
| ```python |
| import torch |
| from transformers import AutoProcessor, Mistral3ForConditionalGeneration |
| from peft import PeftModel |
| from PIL import Image |
| |
| processor = AutoProcessor.from_pretrained("mistralai/Ministral-3-3B-Instruct-2512-BF16") |
| model = Mistral3ForConditionalGeneration.from_pretrained( |
| "mistralai/Ministral-3-3B-Instruct-2512-BF16", |
| torch_dtype=torch.bfloat16, |
| ) |
| model = PeftModel.from_pretrained(model, "BenBarr/flystral") |
| model = model.merge_and_unload().cuda().eval() |
| |
| img = Image.open("drone_frame.jpg").convert("RGB") |
|
|
| messages = [{"role": "user", "content": [ |
| {"type": "image"}, |
| {"type": "text", "text": "Output the raw telemetry for this frame."}, |
| ]}] |
| |
| text = processor.apply_chat_template(messages, add_generation_prompt=True) |
| inputs = processor(text=text, images=[img], return_tensors="pt").to("cuda") |
| |
| with torch.no_grad(): |
| output_ids = model.generate(**inputs, max_new_tokens=200, do_sample=False) |
| |
| result = processor.decode(output_ids[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True) |
| print(result) # Telemetry vector: vx, vy, vz, yaw_rate, ... |
| ``` |
| |
| ## Architecture |
| |
| The adapter sits in the Louise multi-agent drone escort system: |
| |
| - **Flystral** (this model) β flight control from camera images |
| - **Helpstral** β safety/threat assessment from camera images (Pixtral 12B) |
| - **Louise** β conversational safety companion (Ministral 3B) |
| |
| When the fine-tuned endpoint is available, Flystral uses this adapter. When offline, it falls back to agentic mode on the base Ministral 3B via the Mistral API with function calling. |
| |
| ## Developed by |
| |
| Ben Barrett β Mistral Worldwide Hackathon 2026 |
| |