File size: 8,542 Bytes
f7da1a6
 
8605b26
 
 
 
 
 
 
 
 
 
 
 
 
 
f7da1a6
8605b26
 
a9306eb
8605b26
 
 
 
 
 
 
f1a18e8
8605b26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e3061fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8605b26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e3061fc
8605b26
 
 
 
 
 
 
 
 
e3061fc
 
 
 
 
 
 
 
 
 
8605b26
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---

license: apache-2.0
base_model: arcee-ai/Trinity-Mini
library_name: peft
pipeline_tag: text-generation
tags:
  - lora
  - peft
  - grpo
  - reinforcement-learning
  - biomedical
  - relation-extraction
  - drug-protein
  - moe
language:
  - en
---


<p align="center">
  <img src="https://huggingface.co/lokahq/Trinity-Mini-DrugProt-Think/resolve/main/assets/logo.png" alt="Trinity-Mini-DrugProt-Think" style="width:100%; max-width:100%;" />
</p>

<p align="center">
  <strong>Trinity-Mini-DrugProt-Think</strong><br/>
  RLVR (GRPO) + LoRA post-training on Arcee Trinity Mini for DrugProt relation classification.
</p>

<p align="center"><a href="https://lokahq.github.io/Trinity-Mini-DrugProt-Think/">📝 <strong>Report</strong></a> &nbsp;|&nbsp; <a href="https://medium.com/loka-engineering/deploying-trinity-mini-drugprot-think-on-amazon-sagemaker-ai-9e1c1c430ce9"><img src="https://www.sysgroup.com/wp-content/uploads/2025/02/Amazon_Web_Services-Logo.wine_.png" style="height:16px; width:auto; vertical-align:middle; display:inline-block;"/> <strong>AWS deployment guide</strong></a> &nbsp;|&nbsp; <a href="https://github.com/LokaHQ/Trinity-Mini-DrugProt-Think" aria-label="GitHub"><svg viewBox="0 0 16 16" fill="currentColor" width="16" height="16" style="vertical-align:middle; display:inline-block;"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27s1.36.09 2 .27c1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.01 8.01 0 0 0 16 8c0-4.42-3.58-8-8-8z"/></svg> <strong>GitHub</strong></a></p>


# Trinity-Mini-DrugProt-Think

A LoRA adapter fine-tuned on [Arcee Trinity Mini](https://huggingface.co/arcee-ai/Trinity-Mini) using GRPO (Group Relative Policy Optimization) for **drug-protein relation extraction** on the [DrugProt (BioCreative VII)](https://huggingface.co/datasets/OpenMed/drugprot-parquet) benchmark. The model classifies 13 types of drug-protein interactions from PubMed abstracts, producing structured pharmacological reasoning traces before giving its answer.


## Model Details

| Property | Value |
|---|---|
| Base Model | [arcee-ai/Trinity-Mini](https://huggingface.co/arcee-ai/Trinity-Mini) |
| Architecture | Sparse MoE (26B total / 3B active) |
| Fine-tuning Method | LoRA (Low-Rank Adaptation) |
| Training Method | GRPO (Reinforcement Learning) |
| Training Data | [maziyar/OpenMed_DrugProt](https://huggingface.co/datasets/OpenMed/drugprot-parquet) |
| Task | Drug-protein relation extraction (13-way classification) |
| Trainable Parameters | LoRA rank=16, all projection layers |
| License | Apache 2.0 |

## Training Configuration

| Parameter | Value |
|---|---|
| LoRA Alpha (α) | 64 |
| LoRA Rank | 16 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj + experts |

| Learning Rate | 3e-6 |

| Batch Size | 128 |

| Rollouts per Example | 8 |

| Max Generation Tokens | 2048 |

| Temperature | 0.7 |



## Quick Start



**Installation**



```bash

pip install transformers peft torch accelerate

```



**Usage**



```python

from peft import PeftModel

from transformers import AutoModelForCausalLM, AutoTokenizer

import torch



base_model_id = "arcee-ai/Trinity-Mini"

adapter_id = "lokahq/Trinity-Mini-DrugProt-Think"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,

    torch_dtype=torch.bfloat16,

    device_map="auto",

    trust_remote_code=True

)

model = PeftModel.from_pretrained(model, adapter_id)


messages = [
    {

        "role": "system",

        "content": (

            "You are an expert biomedical relation extraction assistant. Your task is to identify the type of interaction between a drug/chemical and a gene/protein in biomedical text.\n\n"

            "For each question:\n"

            "1. First, wrap your detailed biomedical reasoning inside <think></think> tags\n"

            "2. Analyze the context around both entities to understand their relationship\n"

            "3. Consider the pharmacological and molecular mechanisms involved\n"

            "4. Then provide your final answer inside \\boxed{} using exactly one letter (A-M)\n\n"

            "The 13 DrugProt relation types are:\n"

            "A. INDIRECT-DOWNREGULATOR - Chemical indirectly decreases protein activity/expression\n"

            "B. INDIRECT-UPREGULATOR - Chemical indirectly increases protein activity/expression\n"

            "C. DIRECT-REGULATOR - Chemical directly regulates protein (mechanism unspecified)\n"

            "D. ACTIVATOR - Chemical activates the protein\n"

            "E. INHIBITOR - Chemical inhibits the protein\n"

            "F. AGONIST - Chemical acts as an agonist of the receptor/protein\n"

            "G. AGONIST-ACTIVATOR - Chemical is both agonist and activator\n"

            "H. AGONIST-INHIBITOR - Chemical is agonist but inhibits downstream effects\n"

            "I. ANTAGONIST - Chemical acts as an antagonist of the receptor/protein\n"

            "J. PRODUCT-OF - Chemical is a product of the enzyme\n"

            "K. SUBSTRATE - Chemical is a substrate of the enzyme\n"

            "L. SUBSTRATE_PRODUCT-OF - Chemical is both substrate and product\n"

            "M. PART-OF - Chemical is part of the protein complex\n\n"

            "Example format:\n"

            "<think>\n"

            "The text describes [chemical] and [protein]. Based on the context...\n"

            "- The phrase \"[relevant text]\" indicates that...\n"

            "- This suggests a [type] relationship because...\n"

            "</think>\n"

            "\\boxed{A}"

        )

    },

    {

        "role": "user",

        "content": (

            "Abstract: [PASTE PUBMED ABSTRACT HERE]\n\n"

            "Chemical entity: [DRUG NAME]\n"

            "Protein entity: [PROTEIN NAME]\n\n"

            "What is the relationship between the chemical and protein entities? "

            "Choose from: A) INHIBITOR B) SUBSTRATE C) INDIRECT-DOWNREGULATOR "

            "D) INDIRECT-UPREGULATOR E) AGONIST F) ANTAGONIST G) ACTIVATOR "

            "H) PRODUCT-OF I) AGONIST-ACTIVATOR J) INDIRECT-UPREGULATOR "

            "K) PART-OF L) SUBSTRATE_PRODUCT-OF M) NOT\n\n"

            "Think step by step, then provide your answer in \\boxed{} format."

        )

    }

]


text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.75)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```



## Training Progress



Training ran for ~100 steps on Prime Intellect infrastructure. Best accuracy reward reached ~0.83 during training.



## Limitations



- This is a LoRA adapter and requires the base model ([arcee-ai/Trinity-Mini](https://huggingface.co/arcee-ai/Trinity-Mini)) to run

- Evaluated on training-split held-out data; not yet benchmarked on the official DrugProt test set

- Optimized specifically for 13-way DrugProt classification; may not generalize to other biomedical RE tasks



## Citation



<div class="citation-block">

	            <pre><code>@misc{jakimovski2026drugprotrl,

  title        = {Post-Training an Open MoE Model to Extract Drug-Protein Relations: Trinity-Mini-DrugProt-Think},

  author       = {Jakimovski, Bojan and Kalinovski, Petar},

  year         = {2026},

  month        = feb,

  howpublished = {Blog post},

  url          = {https://github.com/LokaHQ/Trinity-Mini-DrugProt-Think}

}</code></pre>

	          </div>

```



## Acknowledgements



- [Arcee AI](https://www.arcee.ai/) for the Trinity Mini base model

- [Prime Intellect](https://www.primeintellect.ai/) for training infrastructure

- [maziyar](https://huggingface.co/maziyar) for the OpenMed DrugProt RL environment

- [Hugging Face](https://huggingface.co/) for the PEFT library



## Authors



[Bojan Jakimovski](mailto:bojan.jakimovski@loka.com) · [Petar Kalinovski](mailto:petar.kalinovski@loka.com) · [Loka](https://loka.com)