Model Card for Asymmetric-Executor-Nav-Instruct

Model Summary

This model is a fine-tuned version of Qwen3-VL-8B-Instruct, designed to function as the Low-Level Executor within an Asymmetric Cognitive Architecture for long-horizon agents. It is specialized for partially observable navigation tasks involving memory dependencies and backtracking (specifically the Cookie and 2-Keys domains).

The model operates in a Reflection-Action Loop, explicitly verifying subgoal completion before generating atomic actions. It was trained using a 9-step curriculum learning strategy to master perception, verification, and actuation.

Model Details

Architecture: Hierarchical VLM (Low-Level Executor)

Task: Vision-Language Navigation, Subgoal Verification, Atomic Action Generation

Dataset: Expert Demonstrations + Key Episode Dataset (Rebalanced 1:3)

Training Method: LoRA Fine-tuning (Rank 8, Alpha 16)

Paper: Strategic Planning, Precise Execution: An Asymmetric Cognitive Architecture for Long-Horizon VLM Agents (ICML Submission)

Intended Use

This model is intended to be used alongside a High-Level Planner (e.g., Gemini 3 Pro, GPT-4o).

Input:

Current visual observation (RGB Image).

Textual Subgoal assigned by the Planner (e.g., "Go to Room A and press the button").

Output:

A structured Chain-of-Thought (CoT) response following the Q1-Q5 format: Perception, Verification, Transition, Navigation, and Atomic Action.

Performance

Cookie Domain Success Rate: ~87% (vs 22% for RL baselines).

2-Keys Domain Success Rate: ~89% (vs 15% for RL baselines).

Inference Efficiency:

Enables a 24-fold reduction in operational costs compared to symmetric high-level model baselines.

Training Hyperparameters

LoRA Rank: 8

LoRA Alpha: 16

Precision: bf16

Learning Rate: 5e-5 (Cosine Scheduler)

Batch Size: 16 (Effective)

Hardware: NVIDIA L40S GPU

Downloads last month: -

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for Wuduandaun/curr_cookie_key

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(222)

this model