Qwen3-VL-4B-Instruct Geometry3K Model

This directory contains a Qwen3-VL-4B-Instruct model trained using SFT (Supervised Fine-Tuning) + RL (Reinforcement Learning) methods, specifically optimized for the Geometry3K geometric reasoning task.

Model Information

Base Model: Qwen3-VL-4B-Instruct
Training Method: SFT + RL
Dataset: Geometry3K
Baseline Accuracy: 0.4842
SFT+RL Accuracy: 0.6356

Directory Structure

Qwen3-VL-4B-Instruct-Geometry3k/
├── README.md                                    # This file
├── config.json                                  # Model configuration file
├── generation_config.json                       # Generation configuration
├── tokenizer_config.json                        # Tokenizer configuration
├── tokenizer.json                               # Tokenizer file
├── vocab.json                                   # Vocabulary file
├── merges.txt                                   # BPE merges file
├── chat_template.jinja                          # Chat template
├── geo3k_test_2048_qwen3-vl-4b-geometry3k.json  # Test result data
├── eval_geo3k.py                                # Evaluation script
└── geo3k_workflow.py                            # Workflow script

Usage

1. Start Model Service

Model inference is deployed using vLLM:

# Start vLLM service, listening on specified port (e.g., 6049)
vllm serve Qwen3-VL-4B-Instruct-Geometry3k --port 6049

2. Run Evaluation

The evaluation script uses rLLM, calling the above vLLM service via OpenAI-compatible API:

python eval_geo3k.py --port 6049 --model_name Qwen3-VL-4B-Instruct-Geometry3k

Dependency versions:

vLLM: 0.11.0 (model serving)
rLLM: 0.2.1 (evaluation pipeline)

Performance Metrics

Method	Accuracy
Baseline	0.4842
SFT+RL	0.6356

Notes

The model uses BF16 precision and is recommended to run on GPUs that support BF16
The model has merged LoRA weights and can be used directly without loading additional adapters
Evaluation script: eval_geo3k.py. Optional parameters: --n_parallel_tasks (default 128), --max_length (default 2048)

Citation

If you use this model, please cite:

Geometry3K: hiyouga/geometry3k on Hugging Face (converted from InterGPS)
GRPO: DeepSeekMath - Group Relative Policy Optimization, arXiv:2402.03300
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv preprint arXiv:2308.12966

Downloads last month: 4

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for pyromind/Qwen3-VL-4B-Instruct-Geometry3k

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 148

Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 12