MotiVate Qwen3-VL Pose Bridge

This repository contains a pose-conditioned adaptation of Qwen3-VL-4B-Instruct for exercise description and brief corrective feedback generation.

The model is not a plain LoRA checkpoint. In addition to the Qwen LoRA adapter, inference also requires a custom pose bridge:

  • qwen_adapter/: LoRA weights applied to the Qwen language model
  • pose_projector.pt: maps pose latents into Qwen token embedding space
  • pose_adapter.pt: pose encoder adapter weights
  • pose_bridge_config.json: pose token metadata used during embedding injection

Intended Output Format

The model is trained to produce exactly two labeled lines:

Description: ...
Feedback: ...

Description summarizes the observed movement.

Feedback gives short coaching guidance.

What To Upload

For a usable Hugging Face model repo, upload at least:

  • qwen_adapter/
  • pose_projector.pt
  • pose_adapter.pt
  • pose_bridge_config.json
  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json
  • added_tokens.json
  • preprocessor_config.json
  • video_preprocessor_config.json
  • chat_template.jinja
  • vocab.json
  • merges.txt
  • this README.md

You do not need to upload optimizer, scheduler, RNG, or trainer state files unless you want to resume training.

Important Loading Note

This repo cannot be loaded as a single vanilla:

AutoModelForImageTextToText.from_pretrained(repo_id)

That call only loads the base Qwen model weights or a standard Transformers checkpoint layout. This project uses a custom runtime assembly step:

  1. load the base Qwen3-VL model
  2. load the tokenizer / processor from this repo
  3. attach the LoRA adapter from qwen_adapter/
  4. rebuild the pose bridge modules
  5. load pose_projector.pt and pose_adapter.pt
  6. inject pose embeddings at <|pose|> token positions during generation

Example:

from pathlib import Path

from peft import PeftModel
from transformers import AutoModelForImageTextToText, AutoProcessor

from main import build_pose_training_model, resolve_runtime, set_seed
from qwen_pose import load_config


repo_dir = Path("path/to/downloaded/hf-repo")
config_path = Path("path/to/stage2_pose_lora.json")

train_config = load_config(config_path)
training_args = train_config.training.build_training_arguments()
device, model_dtype = resolve_runtime(training_args)
set_seed(train_config.data.sampling_seed)

processor = AutoProcessor.from_pretrained(
    str(repo_dir),
    min_pixels=train_config.data.image_min_pixels,
    max_pixels=train_config.data.image_max_pixels,
)
if processor.tokenizer.pad_token_id is None:
    processor.tokenizer.pad_token = processor.tokenizer.eos_token

qwen_model = AutoModelForImageTextToText.from_pretrained(
    train_config.model.model_name,
    torch_dtype=model_dtype,
)

pose_token = train_config.pose.pose_special_token
if pose_token not in set(processor.tokenizer.additional_special_tokens or []):
    processor.tokenizer.add_special_tokens({"additional_special_tokens": [pose_token]})
qwen_model.resize_token_embeddings(len(processor.tokenizer))
pose_token_id = processor.tokenizer.convert_tokens_to_ids(pose_token)

qwen_model = PeftModel.from_pretrained(
    qwen_model,
    str(repo_dir / "qwen_adapter"),
    is_trainable=False,
)

original_init_checkpoint_path = train_config.model.init_checkpoint_path
object.__setattr__(train_config.model, "init_checkpoint_path", str(repo_dir))
try:
    model, pose_loader = build_pose_training_model(
        train_config=train_config,
        qwen_model=qwen_model,
        device=device,
        pose_token_id=pose_token_id,
    )
finally:
    object.__setattr__(train_config.model, "init_checkpoint_path", original_init_checkpoint_path)

model = model.to(device)
model.eval()

Training Summary

  • Base model: Qwen/Qwen3-VL-4B-Instruct
  • Adaptation: language-side LoRA plus custom pose bridge
  • Task: exercise movement description and short feedback generation
  • Output format: two labeled lines beginning with Description: and Feedback:
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naifenn/qwen_3_pose_output_stage2_lora

Adapter
(22)
this model

Collection including naifenn/qwen_3_pose_output_stage2_lora