Huggggooo commited on
Commit
d331fe7
·
verified ·
1 Parent(s): 23dedf4

Add model card

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ base_model: Huggggooo/ProtoCycle-7B-SFT
6
+ tags:
7
+ - protein-design
8
+ - agentic
9
+ - tool-use
10
+ - qwen2.5
11
+ - reinforcement-learning
12
+ - grpo
13
+ language:
14
+ - en
15
+ ---
16
+
17
+ # ProtoCycle-7B
18
+
19
+ RL checkpoint for **ProtoCycle** — an agentic protein design model that
20
+ performs multi-step, tool-augmented sequence design.
21
+
22
+ This is the **GRPO-TCR (Group Relative Policy Optimization with Tool-Call
23
+ Reward) stage**, initialised from the SFT checkpoint
24
+ [`Huggggooo/ProtoCycle-7B-SFT`](https://huggingface.co/Huggggooo/ProtoCycle-7B-SFT).
25
+ It corresponds to `global_step_20` of the `save10` run.
26
+
27
+ - Base model: `Huggggooo/ProtoCycle-7B-SFT`
28
+ (itself fine-tuned from `Qwen/Qwen2.5-7B-Instruct`)
29
+ - Training framework: [VeRL](https://github.com/volcengine/verl) /
30
+ [Open-AgentRL](https://github.com/Gen-Verse/Open-AgentRL)
31
+ - Stage: agentic RL with GRPO-TCR
32
+ - Rollouts per prompt: 8, max turns: 16
33
+ - Max prompt / response: 8k / 20k tokens
34
+ - Reward manager: `protein` (see
35
+ [ProtoCycle/verl/workers/reward_manager/protein.py](https://github.com/huggggoooooo/ProtoCycle/blob/main/verl/workers/reward_manager/protein.py))
36
+
37
+ ## Reward Signal
38
+
39
+ The GRPO-TCR reward combines:
40
+ 1. **Protocol compliance** — `<think>` / `<plan>` / `<tool_call>` / `<answer>`
41
+ format.
42
+ 2. **Tool-call quality** — per-step ProTrek score of the best sequence after
43
+ each Stage-1 / Stage-3 tool call.
44
+ 3. **Outcome signal** — final (global-best) ProTrek score vs the requirement.
45
+ 4. **Efficiency** — penalty for excessively long rollouts.
46
+
47
+ See
48
+ [`recipe/protein/reward.py`](https://github.com/huggggoooooo/ProtoCycle/blob/main/recipe/protein/reward.py)
49
+ for the exact formulation.
50
+
51
+ ## Agent Protocol
52
+
53
+ ```
54
+ <think> ... reasoning ... </think>
55
+ <plan> ... stage plan ... </plan>
56
+ <tool_call>{"name": "...", "arguments": {...}}</tool_call>
57
+ ...
58
+ <answer>MAEGEITPLKTF...</answer>
59
+ ```
60
+
61
+ Registered tools (10):
62
+ - **Stage 1:** `function2seq`, `pathway2seq`, `domain2seq`, `go2seq`,
63
+ `dna_binding2seq`
64
+ - **Stage 2:** `cofactor2constraints`, `motif2constraints`,
65
+ `signal2constraints`
66
+ - **Stage 3:** `esm_inpaint`, `get_score`
67
+
68
+ ## How to Use
69
+
70
+ Inference requires the biology tools shipped in the
71
+ [ProtoCycle](https://github.com/huggggoooooo/ProtoCycle) repo:
72
+
73
+ ```bash
74
+ export MODEL_DIR=/path/to/ProtoCycle-7B # this checkpoint
75
+ export MODEL_NAME=ProtoCycle-7B
76
+ export CONDA_ROOT=/path/to/miniconda3
77
+ # Also export PROTREK_*_DIR, ESM_MODEL_PATH, etc. (see ProtoCycle README)
78
+ bash infer_tools.sh
79
+ ```
80
+
81
+ ## License
82
+
83
+ Apache-2.0.
84
+
85
+ ## Citation
86
+
87
+ If you find this work useful, please cite ProtoCycle (forthcoming) and the
88
+ upstream frameworks: VeRL, Open-AgentRL, ProTrek, ESM.