Upload folder using huggingface_hub
Browse files- .gitattributes +4 -0
- README.md +114 -0
- pics/LOGOS-mainfigure.png +3 -0
- pics/bench_comparison.png +3 -0
- pics/logos-data-process.png +3 -0
- pics/logos.png +3 -0
.gitattributes
CHANGED
|
@@ -34,3 +34,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
pics/LOGOS-mainfigure.png filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
pics/bench_comparison.png filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
pics/logos-data-process.png filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
pics/logos.png filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
task_categories:
|
| 4 |
+
- text-generation
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
tags:
|
| 8 |
+
- scientific-language-model
|
| 9 |
+
- protein
|
| 10 |
+
- molecule
|
| 11 |
+
- drug-discovery
|
| 12 |
+
- materials-science
|
| 13 |
+
- retrosynthesis
|
| 14 |
+
- antibody
|
| 15 |
+
- autoregressive
|
| 16 |
+
- generative
|
| 17 |
+
- one-model-fits-all
|
| 18 |
+
size_categories:
|
| 19 |
+
- 1B-10B
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
# LOGOS: Language of Generative Objects in Science
|
| 23 |
+
|
| 24 |
+
<p align="center">
|
| 25 |
+
<img src="pics/logos.png" alt="LOGOS" height="120">
|
| 26 |
+
</p>
|
| 27 |
+
|
| 28 |
+
<p align="center">
|
| 29 |
+
<a href="https://arxiv.org/abs/2510.24701" target="_blank"><img src="https://img.shields.io/badge/Technical Report-b5212f.svg?logo=arxiv" height="21px"></a>
|
| 30 |
+
<a href="https://github.com/placeholder/LOGOS"><img src="https://img.shields.io/badge/GitHub-LOGOS-181717?logo=github&logoColor=white" height="21px"></a>
|
| 31 |
+
</p>
|
| 32 |
+
|
| 33 |
+
<p align="center">
|
| 34 |
+
<a href="https://huggingface.co/LOGOS-Hub/LOGOS-8B"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model-LOGOS--8B-yellow" height="21px"></a>
|
| 35 |
+
<a href="https://huggingface.co/LOGOS-Hub/LOGOS-pretrain-1B"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model-LOGOS--pretrain--1B-yellow" height="21px"></a>
|
| 36 |
+
<a href="https://huggingface.co/LOGOS-Hub/LOGOS-pretrain-3B"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model-LOGOS--pretrain--3B-yellow" height="21px"></a>
|
| 37 |
+
<a href="https://huggingface.co/LOGOS-Hub/LOGOS-pretrain-8B"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model-LOGOS--pretrain--8B-yellow" height="21px"></a>
|
| 38 |
+
</p>
|
| 39 |
+
|
| 40 |
+
## Overview
|
| 41 |
+
|
| 42 |
+
**LOGOS** (**L**anguage **O**f **G**enerative **O**bjects in **S**cience) is the first multi-domain generative framework built on a unified *scientific grammar*. It encodes diverse scientific objects — proteins, antibodies, small molecules, chemical reactions, materials, and their spatial interactions — as token sequences over a shared vocabulary, enabling a single autoregressive model to perform generation, prediction, and design across the natural sciences.
|
| 43 |
+
|
| 44 |
+
Unlike approaches that rely on natural language as an intermediary or require explicit 3D geometric networks, LOGOS operates directly on domain-native representations. Key spatial relationships (e.g., protein pocket–ligand contacts) are discretized and tokenized into the shared grammar, allowing the model to learn complex structural interactions in a purely sequential manner.
|
| 45 |
+
|
| 46 |
+
<p align="center">
|
| 47 |
+
<img src="pics/LOGOS-mainfigure.png" alt="LOGOS Framework Overview" width="90%">
|
| 48 |
+
</p>
|
| 49 |
+
|
| 50 |
+
### Key Features
|
| 51 |
+
|
| 52 |
+
* **Unified Scientific Grammar**: A shared representational interface that encodes heterogeneous scientific objects and cross-object relationships into a common discrete token space.
|
| 53 |
+
* **One Model Fits All**: A single autoregressive model handles tasks across proteins, small molecules, materials, reactions, antibodies, and their interactions.
|
| 54 |
+
* **No Explicit 3D Geometry Required**: Spatial contact and constraint patterns are captured through tokenized representations, without relying on geometric neural networks or explicit coordinates.
|
| 55 |
+
* **Pre-training & Downstream Alignment**: The grammar space ensures formal consistency between continued pre-training objectives and downstream task goals.
|
| 56 |
+
|
| 57 |
+
<p align="center">
|
| 58 |
+
<img src="pics/logos-data-process.png" alt="Data Construction in LOGOS" width="90%">
|
| 59 |
+
</p>
|
| 60 |
+
|
| 61 |
+
## Supported Tasks
|
| 62 |
+
|
| 63 |
+
LOGOS achieves competitive or state-of-the-art performance across six representative downstream tasks:
|
| 64 |
+
|
| 65 |
+
| Task | Domain | Description |
|
| 66 |
+
| ---- | ------ | ----------- |
|
| 67 |
+
| Interaction-Aware Ligand Design for Binding Pockets | Drug Discovery | Generate ligands capable of specifically binding to a protein binding pocket |
|
| 68 |
+
| Protein Ligand-Binding Site Identification | Structural Biology | Identify binding pockets from protein sequences |
|
| 69 |
+
| Retrosynthesis Prediction | Chemistry | Predict reactants given a target product |
|
| 70 |
+
| Unconditional Material Generation | Materials Science | Generate novel and valid materials |
|
| 71 |
+
| Protein Editing | Protein Engineering | Edit protein sequences for improved functional properties |
|
| 72 |
+
| Antibody CDR Design | Immunology | Design complementarity-determining regions for antibody engineering |
|
| 73 |
+
|
| 74 |
+
<p align="center">
|
| 75 |
+
<img src="pics/bench_comparison.png" alt="Benchmark Comparison" width="90%">
|
| 76 |
+
</p>
|
| 77 |
+
|
| 78 |
+
## Model Architecture
|
| 79 |
+
|
| 80 |
+
LOGOS is based on an autoregressive Transformer architecture with continued multi-domain pre-training on a unified scientific grammar. The framework spans a parameter range from **1B to 8B**, with stable scaling behavior observed across this range.
|
| 81 |
+
|
| 82 |
+
## Quick Start
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 86 |
+
|
| 87 |
+
model = AutoModelForCausalLM.from_pretrained("LOGOS-Hub/LOGOS-8B")
|
| 88 |
+
tokenizer = AutoTokenizer.from_pretrained("LOGOS-Hub/LOGOS-8B")
|
| 89 |
+
|
| 90 |
+
input_text = "<your_scientific_grammar_input>"
|
| 91 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
| 92 |
+
outputs = model.generate(**inputs, max_new_tokens=512)
|
| 93 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
## Citation
|
| 97 |
+
|
| 98 |
+
LOGOS is developed by Alibaba Group and Gaoling School of Artificial Intelligence, Renmin University of China. If you find this work useful in your research or applications, please cite our technical report.
|
| 99 |
+
|
| 100 |
+
```bibtex
|
| 101 |
+
@article{li2025logos,
|
| 102 |
+
title={Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences},
|
| 103 |
+
author={Li, Mingyang and Liu, Yurou and Ye, Jieping and Su, Bing and Wen, Ji-Rong and Wang, Zheng},
|
| 104 |
+
year={2025},
|
| 105 |
+
journal={arXiv preprint arXiv:2510.24701},
|
| 106 |
+
url={https://arxiv.org/abs/2510.24701}
|
| 107 |
+
}
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
## License
|
| 111 |
+
|
| 112 |
+
This project is released under **[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)**.
|
| 113 |
+
|
| 114 |
+
We welcome collaboration, feedback, and community contributions to advance unified generative modeling for the natural sciences.
|
pics/LOGOS-mainfigure.png
ADDED
|
Git LFS Details
|
pics/bench_comparison.png
ADDED
|
Git LFS Details
|
pics/logos-data-process.png
ADDED
|
Git LFS Details
|
pics/logos.png
ADDED
|
Git LFS Details
|