File size: 8,542 Bytes
f7da1a6 8605b26 f7da1a6 8605b26 a9306eb 8605b26 f1a18e8 8605b26 e3061fc 8605b26 e3061fc 8605b26 e3061fc 8605b26 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | ---
license: apache-2.0
base_model: arcee-ai/Trinity-Mini
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- peft
- grpo
- reinforcement-learning
- biomedical
- relation-extraction
- drug-protein
- moe
language:
- en
---
<p align="center">
<img src="https://huggingface.co/lokahq/Trinity-Mini-DrugProt-Think/resolve/main/assets/logo.png" alt="Trinity-Mini-DrugProt-Think" style="width:100%; max-width:100%;" />
</p>
<p align="center">
<strong>Trinity-Mini-DrugProt-Think</strong><br/>
RLVR (GRPO) + LoRA post-training on Arcee Trinity Mini for DrugProt relation classification.
</p>
<p align="center"><a href="https://lokahq.github.io/Trinity-Mini-DrugProt-Think/">📝 <strong>Report</strong></a> | <a href="https://medium.com/loka-engineering/deploying-trinity-mini-drugprot-think-on-amazon-sagemaker-ai-9e1c1c430ce9"><img src="https://www.sysgroup.com/wp-content/uploads/2025/02/Amazon_Web_Services-Logo.wine_.png" style="height:16px; width:auto; vertical-align:middle; display:inline-block;"/> <strong>AWS deployment guide</strong></a> | <a href="https://github.com/LokaHQ/Trinity-Mini-DrugProt-Think" aria-label="GitHub"><svg viewBox="0 0 16 16" fill="currentColor" width="16" height="16" style="vertical-align:middle; display:inline-block;"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27s1.36.09 2 .27c1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.01 8.01 0 0 0 16 8c0-4.42-3.58-8-8-8z"/></svg> <strong>GitHub</strong></a></p>
# Trinity-Mini-DrugProt-Think
A LoRA adapter fine-tuned on [Arcee Trinity Mini](https://huggingface.co/arcee-ai/Trinity-Mini) using GRPO (Group Relative Policy Optimization) for **drug-protein relation extraction** on the [DrugProt (BioCreative VII)](https://huggingface.co/datasets/OpenMed/drugprot-parquet) benchmark. The model classifies 13 types of drug-protein interactions from PubMed abstracts, producing structured pharmacological reasoning traces before giving its answer.
## Model Details
| Property | Value |
|---|---|
| Base Model | [arcee-ai/Trinity-Mini](https://huggingface.co/arcee-ai/Trinity-Mini) |
| Architecture | Sparse MoE (26B total / 3B active) |
| Fine-tuning Method | LoRA (Low-Rank Adaptation) |
| Training Method | GRPO (Reinforcement Learning) |
| Training Data | [maziyar/OpenMed_DrugProt](https://huggingface.co/datasets/OpenMed/drugprot-parquet) |
| Task | Drug-protein relation extraction (13-way classification) |
| Trainable Parameters | LoRA rank=16, all projection layers |
| License | Apache 2.0 |
## Training Configuration
| Parameter | Value |
|---|---|
| LoRA Alpha (α) | 64 |
| LoRA Rank | 16 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj + experts |
| Learning Rate | 3e-6 |
| Batch Size | 128 |
| Rollouts per Example | 8 |
| Max Generation Tokens | 2048 |
| Temperature | 0.7 |
## Quick Start
**Installation**
```bash
pip install transformers peft torch accelerate
```
**Usage**
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base_model_id = "arcee-ai/Trinity-Mini"
adapter_id = "lokahq/Trinity-Mini-DrugProt-Think"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter_id)
messages = [
{
"role": "system",
"content": (
"You are an expert biomedical relation extraction assistant. Your task is to identify the type of interaction between a drug/chemical and a gene/protein in biomedical text.\n\n"
"For each question:\n"
"1. First, wrap your detailed biomedical reasoning inside <think></think> tags\n"
"2. Analyze the context around both entities to understand their relationship\n"
"3. Consider the pharmacological and molecular mechanisms involved\n"
"4. Then provide your final answer inside \\boxed{} using exactly one letter (A-M)\n\n"
"The 13 DrugProt relation types are:\n"
"A. INDIRECT-DOWNREGULATOR - Chemical indirectly decreases protein activity/expression\n"
"B. INDIRECT-UPREGULATOR - Chemical indirectly increases protein activity/expression\n"
"C. DIRECT-REGULATOR - Chemical directly regulates protein (mechanism unspecified)\n"
"D. ACTIVATOR - Chemical activates the protein\n"
"E. INHIBITOR - Chemical inhibits the protein\n"
"F. AGONIST - Chemical acts as an agonist of the receptor/protein\n"
"G. AGONIST-ACTIVATOR - Chemical is both agonist and activator\n"
"H. AGONIST-INHIBITOR - Chemical is agonist but inhibits downstream effects\n"
"I. ANTAGONIST - Chemical acts as an antagonist of the receptor/protein\n"
"J. PRODUCT-OF - Chemical is a product of the enzyme\n"
"K. SUBSTRATE - Chemical is a substrate of the enzyme\n"
"L. SUBSTRATE_PRODUCT-OF - Chemical is both substrate and product\n"
"M. PART-OF - Chemical is part of the protein complex\n\n"
"Example format:\n"
"<think>\n"
"The text describes [chemical] and [protein]. Based on the context...\n"
"- The phrase \"[relevant text]\" indicates that...\n"
"- This suggests a [type] relationship because...\n"
"</think>\n"
"\\boxed{A}"
)
},
{
"role": "user",
"content": (
"Abstract: [PASTE PUBMED ABSTRACT HERE]\n\n"
"Chemical entity: [DRUG NAME]\n"
"Protein entity: [PROTEIN NAME]\n\n"
"What is the relationship between the chemical and protein entities? "
"Choose from: A) INHIBITOR B) SUBSTRATE C) INDIRECT-DOWNREGULATOR "
"D) INDIRECT-UPREGULATOR E) AGONIST F) ANTAGONIST G) ACTIVATOR "
"H) PRODUCT-OF I) AGONIST-ACTIVATOR J) INDIRECT-UPREGULATOR "
"K) PART-OF L) SUBSTRATE_PRODUCT-OF M) NOT\n\n"
"Think step by step, then provide your answer in \\boxed{} format."
)
}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.75)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Progress
Training ran for ~100 steps on Prime Intellect infrastructure. Best accuracy reward reached ~0.83 during training.
## Limitations
- This is a LoRA adapter and requires the base model ([arcee-ai/Trinity-Mini](https://huggingface.co/arcee-ai/Trinity-Mini)) to run
- Evaluated on training-split held-out data; not yet benchmarked on the official DrugProt test set
- Optimized specifically for 13-way DrugProt classification; may not generalize to other biomedical RE tasks
## Citation
<div class="citation-block">
<pre><code>@misc{jakimovski2026drugprotrl,
title = {Post-Training an Open MoE Model to Extract Drug-Protein Relations: Trinity-Mini-DrugProt-Think},
author = {Jakimovski, Bojan and Kalinovski, Petar},
year = {2026},
month = feb,
howpublished = {Blog post},
url = {https://github.com/LokaHQ/Trinity-Mini-DrugProt-Think}
}</code></pre>
</div>
```
## Acknowledgements
- [Arcee AI](https://www.arcee.ai/) for the Trinity Mini base model
- [Prime Intellect](https://www.primeintellect.ai/) for training infrastructure
- [maziyar](https://huggingface.co/maziyar) for the OpenMed DrugProt RL environment
- [Hugging Face](https://huggingface.co/) for the PEFT library
## Authors
[Bojan Jakimovski](mailto:bojan.jakimovski@loka.com) · [Petar Kalinovski](mailto:petar.kalinovski@loka.com) · [Loka](https://loka.com) |