Qwen3-VL-8B-Instruct Geometry3K Model
This directory contains a Qwen3-VL-8B-Instruct model trained using SFT (Supervised Fine-Tuning) + RL (Reinforcement Learning) methods, specifically optimized for the Geometry3K geometric reasoning task.
Model Information
- Base Model: Qwen3-VL-8B-Instruct
- Training Method: SFT + RL
- Dataset: Geometry3K
- Baseline Accuracy: 0.5208
- SFT+RL Accuracy: 0.6689
Directory Structure
Qwen3-VL-8B-Instruct-Geometry3k/
βββ README.md # This file
βββ config.json # Model configuration file
βββ generation_config.json # Generation configuration
βββ tokenizer_config.json # Tokenizer configuration
βββ tokenizer.json # Tokenizer file
βββ vocab.json # Vocabulary file
βββ merges.txt # BPE merges file
βββ chat_template.jinja # Chat template
βββ geo3k_test_2048_qwen3-vl-8b-geometry3k.json # Test result data
βββ eval_geo3k.py # Evaluation script
βββ geo3k_workflow.py # Workflow script
Usage
1. Start Model Service
Model inference is deployed using vLLM:
# Start vLLM service, listening on specified port (e.g., 6049)
vllm serve Qwen3-VL-8B-Instruct-Geometry3k --port 6049
2. Run Evaluation
The evaluation script uses rLLM, calling the above vLLM service via OpenAI-compatible API:
python eval_geo3k.py --port 6049 --model_name Qwen3-VL-8B-Instruct-Geometry3k
Dependency versions:
- vLLM: 0.11.0 (model serving)
- rLLM: 0.2.1 (evaluation pipeline)
Performance Metrics
| Method | Accuracy |
|---|---|
| Baseline | 0.5208 |
| SFT+RL | 0.6689 |
Related Files
- Training Scripts: SFT and RL training scripts
- Project README:
../README.md
Notes
- The model uses BF16 precision and is recommended to run on GPUs that support BF16
- The model has merged LoRA weights and can be used directly without loading additional adapters
- Evaluation script:
eval_geo3k.py. Optional parameters:--n_parallel_tasks(default 128),--max_length(default 2048)
Citation
If you use this model, please cite:
- Geometry3K: hiyouga/geometry3k on Hugging Face (converted from InterGPS)
- GRPO: DeepSeekMath - Group Relative Policy Optimization, arXiv:2402.03300
- Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv preprint arXiv:2308.12966