DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper
β’ 2402.03300 β’ Published
β’ 141
YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
This directory contains a Qwen3-VL-8B-Instruct model trained using SFT (Supervised Fine-Tuning) + RL (Reinforcement Learning) methods, specifically optimized for the Geometry3K geometric reasoning task.
Qwen3-VL-8B-Instruct-Geometry3k/
βββ README.md # This file
βββ config.json # Model configuration file
βββ generation_config.json # Generation configuration
βββ tokenizer_config.json # Tokenizer configuration
βββ tokenizer.json # Tokenizer file
βββ vocab.json # Vocabulary file
βββ merges.txt # BPE merges file
βββ chat_template.jinja # Chat template
βββ geo3k_test_2048_qwen3-vl-8b-geometry3k.json # Test result data
βββ eval_geo3k.py # Evaluation script
βββ geo3k_workflow.py # Workflow script
Model inference is deployed using vLLM:
# Start vLLM service, listening on specified port (e.g., 6049)
vllm serve Qwen3-VL-8B-Instruct-Geometry3k --port 6049
The evaluation script uses rLLM, calling the above vLLM service via OpenAI-compatible API:
python eval_geo3k.py --port 6049 --model_name Qwen3-VL-8B-Instruct-Geometry3k
Dependency versions:
| Method | Accuracy |
|---|---|
| Baseline | 0.5208 |
| SFT+RL | 0.6689 |
../README.mdeval_geo3k.py. Optional parameters: --n_parallel_tasks (default 128), --max_length (default 2048)If you use this model, please cite: