Model Card for Asymmetric-Executor-Nav-Instruct
Model Summary
This model is a fine-tuned version of Qwen3-VL-8B-Instruct, designed to function as the Low-Level Executor within an Asymmetric Cognitive Architecture for long-horizon agents. It is specialized for partially observable navigation tasks involving memory dependencies and backtracking (specifically the Cookie and 2-Keys domains).
The model operates in a Reflection-Action Loop, explicitly verifying subgoal completion before generating atomic actions. It was trained using a 9-step curriculum learning strategy to master perception, verification, and actuation.
Model Details
Architecture: Hierarchical VLM (Low-Level Executor)
Task: Vision-Language Navigation, Subgoal Verification, Atomic Action Generation
Dataset: Expert Demonstrations + Key Episode Dataset (Rebalanced 1:3)
Training Method: LoRA Fine-tuning (Rank 8, Alpha 16)
Paper: Strategic Planning, Precise Execution: An Asymmetric Cognitive Architecture for Long-Horizon VLM Agents (ICML Submission)
Intended Use
This model is intended to be used alongside a High-Level Planner (e.g., Gemini 3 Pro, GPT-4o).
Input:
Current visual observation (RGB Image).
Textual Subgoal assigned by the Planner (e.g., "Go to Room A and press the button").
Output:
A structured Chain-of-Thought (CoT) response following the Q1-Q5 format: Perception, Verification, Transition, Navigation, and Atomic Action.
Performance
Cookie Domain Success Rate: ~87% (vs 22% for RL baselines).
2-Keys Domain Success Rate: ~89% (vs 15% for RL baselines).
Inference Efficiency:
Enables a 24-fold reduction in operational costs compared to symmetric high-level model baselines.
Training Hyperparameters
LoRA Rank: 8
LoRA Alpha: 16
Precision: bf16
Learning Rate: 5e-5 (Cosine Scheduler)
Batch Size: 16 (Effective)
Hardware: NVIDIA L40S GPU
- Downloads last month
- -
Model tree for Wuduandaun/curr_cookie_key
Base model
Qwen/Qwen3-VL-8B-Instruct