Qwen3-VL-8B-Instruct Geometry3K Model

This directory contains a Qwen3-VL-8B-Instruct model trained using SFT (Supervised Fine-Tuning) + RL (Reinforcement Learning) methods, specifically optimized for the Geometry3K geometric reasoning task.

Model Information

Base Model: Qwen3-VL-8B-Instruct
Training Method: SFT + RL
Dataset: Geometry3K
Baseline Accuracy: 0.5208
SFT+RL Accuracy: 0.6689

Directory Structure

Qwen3-VL-8B-Instruct-Geometry3k/
├── README.md                                    # This file
├── config.json                                  # Model configuration file
├── generation_config.json                       # Generation configuration
├── tokenizer_config.json                        # Tokenizer configuration
├── tokenizer.json                               # Tokenizer file
├── vocab.json                                   # Vocabulary file
├── merges.txt                                   # BPE merges file
├── chat_template.jinja                          # Chat template
├── geo3k_test_2048_qwen3-vl-8b-geometry3k.json  # Test result data
├── eval_geo3k.py                                # Evaluation script
└── geo3k_workflow.py                            # Workflow script

Usage

1. Start Model Service

Model inference is deployed using vLLM:

# Start vLLM service, listening on specified port (e.g., 6049)
vllm serve Qwen3-VL-8B-Instruct-Geometry3k --port 6049

2. Run Evaluation

The evaluation script uses rLLM, calling the above vLLM service via OpenAI-compatible API:

python eval_geo3k.py --port 6049 --model_name Qwen3-VL-8B-Instruct-Geometry3k

Dependency versions:

vLLM: 0.11.0 (model serving)
rLLM: 0.2.1 (evaluation pipeline)

Performance Metrics

Method	Accuracy
Baseline	0.5208
SFT+RL	0.6689

Related Files

Training Scripts: SFT and RL training scripts
Project README: ../README.md

Notes

The model uses BF16 precision and is recommended to run on GPUs that support BF16
The model has merged LoRA weights and can be used directly without loading additional adapters
Evaluation script: eval_geo3k.py. Optional parameters: --n_parallel_tasks (default 128), --max_length (default 2048)

Citation

If you use this model, please cite:

Geometry3K: hiyouga/geometry3k on Hugging Face (converted from InterGPS)
GRPO: DeepSeekMath - Group Relative Policy Optimization, arXiv:2402.03300
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv preprint arXiv:2308.12966