shilinyan commited on
Commit
100afe5
·
verified ·
1 Parent(s): 6eee992

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model:
5
+ - Qwen/Qwen3-VL-8B-Instruct
6
+ tags:
7
+ - multimodal
8
+ - vision-language
9
+ - tool-use
10
+ - agentic
11
+ - qwen3_vl
12
+ - sft
13
+ datasets:
14
+ - Accio-Lab/Metis-ColdStart
15
+ language:
16
+ - en
17
+ pipeline_tag: image-text-to-text
18
+ ---
19
+
20
+ # Metis-8B-ColdStart
21
+
22
+ **Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models**
23
+
24
+ Metis-8B-ColdStart is the **SFT (Supervised Fine-Tuning) checkpoint** of the Metis framework, fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) on the curated [Metis-ColdStart](https://huggingface.co/datasets/Accio-Lab/Metis-ColdStart) dataset. This checkpoint serves as the starting point for HDPO reinforcement learning, which produces the final [Metis-8B-RL](https://huggingface.co/Accio-Lab/Metis-8B-RL) model.
25
+
26
+ [[Paper (arXiv)]](https://arxiv.org/abs/2604.08545) | [[GitHub]](https://github.com/Accio-Lab/Metis) | [[RL Model]](https://huggingface.co/Accio-Lab/Metis-8B-RL) | [[ColdStart Data]](https://huggingface.co/datasets/Accio-Lab/Metis-ColdStart) | [[RL Data]](https://huggingface.co/datasets/Accio-Lab/Metis-RL)
27
+
28
+ ## Model Details
29
+
30
+ | Attribute | Value |
31
+ |---|---|
32
+ | Base model | [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) |
33
+ | Training stage | Supervised Fine-Tuning (Cold Start) |
34
+ | Training data | [Metis-ColdStart](https://huggingface.co/datasets/Accio-Lab/Metis-ColdStart) (~27K samples) |
35
+ | Next stage | → [Metis-8B-RL](https://huggingface.co/Accio-Lab/Metis-8B-RL) (HDPO reinforcement learning) |
36
+ | License | Apache-2.0 |
37
+
38
+ ## Cold Start Data Curation Pipeline
39
+
40
+ The SFT corpus is curated from publicly available tool-augmented multimodal trajectories (DeepEyesV2, V-Interaction, Thyme, OpenMMReasoner) through a rigorous three-stage pipeline:
41
+
42
+ 1. **Eradicating hallucinated environmental dynamics** — Execute all code in a sandbox environment; discard trajectories with execution failures.
43
+ 2. **Isolating genuine tool necessity** — Filter out samples where the base model achieves pass@8 = 1 without any tools, ensuring only genuinely tool-dependent samples remain.
44
+ 3. **Multidimensional meta-cognitive filtering** — An LLM judge evaluates visual relevance, reasoning coherence, and tool-use rationale to ensure high quality.
45
+
46
+ ## Training Pipeline
47
+
48
+ ```
49
+ Qwen3-VL-8B-Instruct
50
+
51
+ ▼ SFT on Metis-ColdStart (~27K samples)
52
+ Metis-8B-ColdStart ← (this checkpoint)
53
+
54
+ ▼ HDPO on Metis-RL (~5K prompts)
55
+ Metis-8B-RL (final model)
56
+ ```
57
+
58
+ ## Usage
59
+
60
+ Please refer to the [GitHub repository](https://github.com/Accio-Lab/Metis) for full installation and inference instructions.
61
+
62
+ ### Installation
63
+
64
+ ```bash
65
+ git clone https://github.com/Accio-Lab/Metis.git
66
+ cd Metis
67
+ pip install -e verl
68
+ pip install -e ".[vllm,search_tool,python_code_dep]"
69
+ ```
70
+
71
+ ## Citation
72
+
73
+ ```bibtex
74
+ @article{yan2026metis,
75
+ title={Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models},
76
+ author={Yan, Shilin and Tong, Jintao and Xue, Hongwei and Tang, Xiaojun and Wang, Yangyang and Shi, Kunyu and Zhang, Guannan and Li, Ruixuan and Zou, Yixiong},
77
+ journal={arXiv preprint arXiv:2604.08545},
78
+ year={2026}
79
+ }
80
+ ```
81
+
82
+ ## Acknowledgments
83
+
84
+ Metis is built upon [verl](https://github.com/volcengine/verl), [verl-tool](https://github.com/TIGER-AI-Lab/verl-tool), and [Qwen3-VL](https://github.com/QwenLM/Qwen3-VL).