MotiVate Qwen3-VL Pose Bridge

This repository contains a pose-conditioned adaptation of Qwen3-VL-4B-Instruct for exercise description and brief corrective feedback generation.

The model is not a plain LoRA checkpoint. In addition to the Qwen LoRA adapter, inference also requires a custom pose bridge:

qwen_adapter/: LoRA weights applied to the Qwen language model
pose_projector.pt: maps pose latents into Qwen token embedding space
pose_adapter.pt: pose encoder adapter weights
pose_bridge_config.json: pose token metadata used during embedding injection

Intended Output Format

The model is trained to produce exactly two labeled lines:

Description: ...
Feedback: ...

Description summarizes the observed movement.

Feedback gives short coaching guidance.

What To Upload

For a usable Hugging Face model repo, upload at least:

qwen_adapter/
pose_projector.pt
pose_adapter.pt
pose_bridge_config.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json
added_tokens.json
preprocessor_config.json
video_preprocessor_config.json
chat_template.jinja
vocab.json
merges.txt
this README.md

You do not need to upload optimizer, scheduler, RNG, or trainer state files unless you want to resume training.

Important Loading Note

This repo cannot be loaded as a single vanilla:

AutoModelForImageTextToText.from_pretrained(repo_id)

That call only loads the base Qwen model weights or a standard Transformers checkpoint layout. This project uses a custom runtime assembly step:

load the base Qwen3-VL model
load the tokenizer / processor from this repo
attach the LoRA adapter from qwen_adapter/
rebuild the pose bridge modules
load pose_projector.pt and pose_adapter.pt
inject pose embeddings at <|pose|> token positions during generation

Example:

from pathlib import Path

from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor

from main import build_pose_training_model, resolve_runtime, set_seed
from qwen_pose import load_config


repo_dir = Path("path/to/downloaded/hf-repo")
config_path = Path("path/to/stage2_pose_lora.json")

train_config = load_config(config_path)
training_args = train_config.training.build_training_arguments()
device, model_dtype = resolve_runtime(training_args)
set_seed(train_config.data.sampling_seed)

processor = AutoProcessor.from_pretrained(
    str(repo_dir),
    min_pixels=train_config.data.image_min_pixels,
    max_pixels=train_config.data.image_max_pixels,
)
if processor.tokenizer.pad_token_id is None:
    processor.tokenizer.pad_token = processor.tokenizer.eos_token

qwen_model = AutoModelForImageTextToText.from_pretrained(
    train_config.model.model_name,
    torch_dtype=model_dtype,
)

pose_token = train_config.pose.pose_special_token
if pose_token not in set(processor.tokenizer.additional_special_tokens or []):
    processor.tokenizer.add_special_tokens({"additional_special_tokens": [pose_token]})
qwen_model.resize_token_embeddings(len(processor.tokenizer))
pose_token_id = processor.tokenizer.convert_tokens_to_ids(pose_token)

qwen_model = PeftModel.from_pretrained(
    qwen_model,
    str(repo_dir / "qwen_adapter"),
    is_trainable=False,
)

original_init_checkpoint_path = train_config.model.init_checkpoint_path
object.__setattr__(train_config.model, "init_checkpoint_path", str(repo_dir))
try:
    model, pose_loader = build_pose_training_model(
        train_config=train_config,
        qwen_model=qwen_model,
        device=device,
        pose_token_id=pose_token_id,
    )
finally:
    object.__setattr__(train_config.model, "init_checkpoint_path", original_init_checkpoint_path)

model = model.to(device)
model.eval()

Training Summary

Base model: Qwen/Qwen3-VL-4B-Instruct
Adaptation: language-side LoRA plus custom pose bridge
Task: exercise movement description and short feedback generation
Output format: two labeled lines beginning with Description: and Feedback:

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naifenn/qwen_3_pose_output_stage2_lora

Base model

Qwen/Qwen3-VL-4B-Instruct

Adapter

(22)

this model

Collection including naifenn/qwen_3_pose_output_stage2_lora

MotiVate Single Turn

Collection

3 items • Updated 10 days ago