AGORA β Multi-Robot Task Planner v1
Part of the ANIMA Perception Suite by Robot Flow Labs / AIFLOW LABS LIMITED.
Overview
AGORA (Adaptive Group Operations & Resource Allocation) is the Wave-5 unified STEM (Spatio-Temporal-Embodiment Memory) framework for multi-robot collaboration. This repo contains the LoRA-finetuned planner model and all training artifacts for reproducibility.
The planner takes a heterogeneous robot team state (capabilities, battery, location, failure history) and a set of task requests, then produces optimal task-to-robot assignments with structured JSON reasoning.
Status: WEIGHTS PENDING RETRAIN
The original trained weights (merged model 2.9GB + LoRA adapter 71MB) were lost during a server disk cleanup. All training data, scripts, configs, and metrics are preserved for full reproducibility. Retrain with:
cd /path/to/project_agora
CUDA_VISIBLE_DEVICES=0,1 python scripts/train_planner.py
Estimated training time: ~2 hours on 2x NVIDIA L4.
Architecture
- Base model: Qwen2.5-1.5B-Instruct
- Fine-tuning: LoRA (r=16, alpha=32, dropout=0.05)
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Task: Multi-robot task allocation (structured JSON output)
Training Results
| Metric | Value |
|---|---|
| Train loss | 0.234 |
| Train runtime | 1h 50m (6620s) |
| Train samples/sec | 2.27 |
| Total steps | 939 (3 epochs) |
| Eval format valid rate | 100% |
| Eval task coverage | 53.3% |
| Eval robot match rate | 18.7% |
| Avg inference time | 4.50s |
Analysis: The model produces valid JSON 100% of the time (perfect format learning). Task allocation quality differs from the heuristic baseline, expected for a 1.5B model learning complex multi-factor optimization. Recommended improvements: more training data (10k-50k examples), larger base model (Qwen2.5-7B), or RLHF/DPO stage.
Repository Contents
.
βββ README.md # This model card
βββ configs/
β βββ training.toml # Full training configuration
βββ logs/
β βββ training_metrics.json # Final training metrics
β βββ planning_train.jsonl # 5000 training examples
β βββ planning_eval.jsonl # 200 evaluation examples
βββ reports/
β βββ planner_eval.json # Evaluation results
βββ scripts/
β βββ train_planner.py # LoRA fine-tuning script
β βββ eval_planner.py # Evaluation script
β βββ generate_planning_data.py # Synthetic data generator
βββ pytorch/ # (empty β retrain to populate)
β βββ merged/ # Merged model weights
β βββ lora_adapter/ # LoRA adapter only
βββ onnx/ # (empty β export after retrain)
βββ checkpoints/ # (empty β retrain to populate)
βββ tensorrt/ # (generate on target hardware)
Reproducing the Training
Prerequisites
- 2x NVIDIA L4 (23GB each) or equivalent
- CUDA 12.4, PyTorch 2.6.0
transformers,peft,trl,datasets
Step 1: Generate training data (if starting fresh)
python scripts/generate_planning_data.py
Step 2: Train
CUDA_VISIBLE_DEVICES=0,1 python scripts/train_planner.py \
--model Qwen/Qwen2.5-1.5B-Instruct \
--epochs 3 \
--batch-size 4 \
--grad-accum 4 \
--lr 2e-4 \
--lora-r 16 \
--lora-alpha 32
Step 3: Evaluate
python scripts/eval_planner.py
Step 4: Export (after training)
# SafeTensors
from safetensors.torch import save_file
save_file(model.state_dict(), "pytorch/agora_planner_v1.safetensors")
# ONNX
torch.onnx.export(model, dummy_input, "onnx/agora_planner_v1.onnx", opset_version=17)
Usage (after weights are available)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("ilessio-aiflowlab/project_agora", subfolder="pytorch/merged")
tokenizer = AutoTokenizer.from_pretrained("ilessio-aiflowlab/project_agora", subfolder="pytorch/merged")
prompt = """You are a multi-robot task planner. Given the team state and tasks, produce assignments.
Team: [{"id": "robot_1", "type": "manipulator", "battery": 85, "location": "kitchen"}]
Tasks: [{"id": "task_1", "type": "fetch", "target": "cup", "location": "kitchen"}]
Assignments:"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Data Format
Each example in planning_train.jsonl is a conversation turn:
{
"messages": [
{"role": "system", "content": "You are AGORA, a multi-robot task planner..."},
{"role": "user", "content": "Team state: {...}\nTasks: [...]"},
{"role": "assistant", "content": "{\"assignments\": [...], \"reasoning\": \"...\"}"}
]
}
AGORA Framework
The planner is one component of the larger AGORA system:
- STEM Memory β Spatio-Temporal-Embodiment Memory for shared state
- Brain/Cerebellum β Global planning + local execution control loop
- Fleet Coordination β Multi-robot orchestration with fault tolerance
- Inference β LLM/MLLM integration (this model)
Hardware
- Training: 2x NVIDIA L4 (23GB each), CUDA 12.4
- Inference: Any GPU with >= 4GB VRAM (quantized) or >= 8GB (bf16)
- Edge target: NVIDIA Jetson Orin (after TensorRT export)
License
Apache 2.0 β Robot Flow Labs / AIFLOW LABS LIMITED
Model tree for ilessio-aiflowlab/project_agora
Evaluation results
- Format Valid Rateself-reported1.000
- Task Coverageself-reported0.533
- Train Lossself-reported0.234