YAML Metadata Warning: empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-VL-4B-Instruct Geometry3K Model

This directory contains a Qwen3-VL-4B-Instruct model trained using SFT (Supervised Fine-Tuning) + RL (Reinforcement Learning) methods, specifically optimized for the Geometry3K geometric reasoning task.

Model Information

  • Base Model: Qwen3-VL-4B-Instruct
  • Training Method: SFT + RL
  • Dataset: Geometry3K
  • Baseline Accuracy: 0.4842
  • SFT+RL Accuracy: 0.6356

Directory Structure

Qwen3-VL-4B-Instruct-Geometry3k/
β”œβ”€β”€ README.md                                    # This file
β”œβ”€β”€ config.json                                  # Model configuration file
β”œβ”€β”€ generation_config.json                       # Generation configuration
β”œβ”€β”€ tokenizer_config.json                        # Tokenizer configuration
β”œβ”€β”€ tokenizer.json                               # Tokenizer file
β”œβ”€β”€ vocab.json                                   # Vocabulary file
β”œβ”€β”€ merges.txt                                   # BPE merges file
β”œβ”€β”€ chat_template.jinja                          # Chat template
β”œβ”€β”€ geo3k_test_2048_qwen3-vl-4b-geometry3k.json  # Test result data
β”œβ”€β”€ eval_geo3k.py                                # Evaluation script
└── geo3k_workflow.py                            # Workflow script

Usage

1. Start Model Service

Model inference is deployed using vLLM:

# Start vLLM service, listening on specified port (e.g., 6049)
vllm serve Qwen3-VL-4B-Instruct-Geometry3k --port 6049

2. Run Evaluation

The evaluation script uses rLLM, calling the above vLLM service via OpenAI-compatible API:

python eval_geo3k.py --port 6049 --model_name Qwen3-VL-4B-Instruct-Geometry3k

Dependency versions:

  • vLLM: 0.11.0 (model serving)
  • rLLM: 0.2.1 (evaluation pipeline)

Performance Metrics

Method Accuracy
Baseline 0.4842
SFT+RL 0.6356

Notes

  1. The model uses BF16 precision and is recommended to run on GPUs that support BF16
  2. The model has merged LoRA weights and can be used directly without loading additional adapters
  3. Evaluation script: eval_geo3k.py. Optional parameters: --n_parallel_tasks (default 128), --max_length (default 2048)

Citation

If you use this model, please cite:

  • Geometry3K: hiyouga/geometry3k on Hugging Face (converted from InterGPS)
  • GRPO: DeepSeekMath - Group Relative Policy Optimization, arXiv:2402.03300
  • Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv preprint arXiv:2308.12966
Downloads last month
18
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for pyromind/Qwen3-VL-4B-Instruct-Geometry3k