OpenDFM
/

RetroDFM-R-8B

 pipeline_tag: text-generation
 tags:
 - chemistry
+---
+# RetroDFM-R: Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning
+RetroDFM-R is a reasoning-driven large language model designed for chemical retrosynthesis. Unlike traditional graph-based or sequence models, it incorporates large-scale reinforcement learning with chemically verifiable rewards, enabling stronger generalization, higher prediction reliability, and improved interpretability. Comprehensive evaluations show that RetroDFM-R outperforms existing state-of-the-art approaches across standard benchmarks. Double-blind human assessments further confirm the chemical plausibility and practical usefulness of its predictions. The model also successfully reconstructs multistep routes for real drug molecules and complex materials reported in the literature. Its explicit reasoning process offers clear, human-interpretable insights, enhancing trust and real-world applicability in retrosynthesis planning.
+## News
+* <font color="#935000">**2025-11-22**:</font> The parameter of [RetroDFM-R-8B](https://huggingface.co/OpenDFM/RetroDFM-R-8B) is open-sourced!
+* <font color="#935000">**2025-07-23**</font>: The paper of RetroDFM-R is released on arXiv: [Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning](https://arxiv.org/abs/2507.17448).
+## Training Details
+RetroDFM-R is trained through a three-stage pipeline: (1) continual pretraining on retrosynthesis-focused chemical data, (2) supervised fine-tuning on distilled chain-of-thought reasoning samples, and (3) reinforcement learning to further enhance step-by-step reasoning and prediction quality.
+## Usage Details
+### Local Inference
+To load and run RetroDFM-R locally, here is an example:
+```python
+import re
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
+model_name_or_id = "OpenDFM/RetroDFM-R-8B"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
+model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.bfloat16, device_map="auto")
+target_smiles = "<target mol in SMILES format>"
+instruction = f"<SMILES> {target_smiles} </SMILES> Given the product SMILES, your task is to predict the reactants SMILES using your experienced chemical Retrosynthesis knowledge. Please reason step by step, and put your final answer within <answer> answer here </answer>."
+message = [
+    {"role": "user", "content": instruction}
+]
+input_text = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
+generation_config = GenerationConfig(
+    do_sample=True,
+    top_k=20,
+    top_p=0.9,
+    temperature=0.6,
+    max_new_tokens=1024,
+    eos_token_id=tokenizer.eos_token_id
+)
+outputs = model.generate(**inputs, generation_config=generation_config)
+generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
+input_text = tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)
+generated_text = generated_text[len(input_text):].strip()
+print(f"{generated_text=}")
+thinking, answer = re.match(r'<think>(.*?)</think>\s?<answer>(.*?)</answer>', generated_text, re.DOTALL).groups()
+thinking, answer = thinking.strip(), answer.strip()
+print(f"{thinking=}")
+print(f"{answer=}")
+```
+### SMILES preprocess
+When there involves SMILES notation in your input, we recommend to preprocess the SMILES with the `rdkit` package to canonicalize the SMILES. Here is an example:
+```python
+from rdkit import Chem
+def canonicalize_smiles(smiles):
+    mol = Chem.MolFromSmiles(smiles)
+    if mol is None:
+        return None
+    return Chem.MolToSmiles(mol, isomericSmiles=True, kekuleSmiles=False)
+```
+or directly:
+```python
+from rdkit import Chem
+def canonicalize_smiles(smiles):
+    return Chem.CanonSmiles(smiles, useChiral=True)
+```
+## Citation
+```bibtex
+@misc{zhang2025retrodfmr,
+  title={Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning},
+  author={Zhang, Situo and Li, Hanqi and Chen, Lu and Zhao, Zihan and Lin, Xuanze and Zhu, Zichen and Chen, Bo and Chen, Xin and Yu, Kai},
+  year={2025},
+  eprint={2507.17448},
+  archivePrefix={arXiv},
+  primaryClass={cs.CE},
+  url={https://arxiv.org/abs/2507.17448},
+}
+```
+## Disclaimer
+Current version of RetroDFM-R may generate incorrect or misleading information. Please use it with caution and verify the results with domain experts before making any decisions based on the results.