Update README.md
Browse files
README.md
CHANGED
|
@@ -1,32 +1,7 @@
|
|
| 1 |
# ChemFMv2-20M
|
| 2 |
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
## Model Details
|
| 6 |
-
|
| 7 |
-
- **Model Type**: LlamaForCausalLM
|
| 8 |
-
- **Architecture**: LLaMA-based
|
| 9 |
-
- **Parameters**: 20M
|
| 10 |
-
- **Hidden Size**: 640
|
| 11 |
-
- **Layers**: 4
|
| 12 |
-
- **Attention Heads**: 10
|
| 13 |
-
- **Vocabulary Size**: 320
|
| 14 |
-
- **Max Position Embeddings**: 512
|
| 15 |
|
| 16 |
## Usage
|
| 17 |
-
|
| 18 |
-
```python
|
| 19 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 20 |
-
|
| 21 |
-
# Load model and tokenizer
|
| 22 |
-
model_name = "ChemFM/ChemFMv2-20M"
|
| 23 |
-
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 24 |
-
model = AutoModelForCausalLM.from_pretrained(model_name)
|
| 25 |
-
|
| 26 |
-
# Example usage
|
| 27 |
-
text = "Your chemical input here"
|
| 28 |
-
inputs = tokenizer(text, return_tensors="pt")
|
| 29 |
-
outputs = model.generate(**inputs, max_length=100)
|
| 30 |
-
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 31 |
-
print(result)
|
| 32 |
-
```
|
|
|
|
| 1 |
# ChemFMv2-20M
|
| 2 |
|
| 3 |
+
ChemFM is a large-scale foundation model, specifically designed for chemistry.
|
| 4 |
+
It has been [pre-trained](https://github.com/TheLuoFengLab/ChemFM/tree/master/pretraining) on 178 million molecules from [UniChem](https://www.ebi.ac.uk/unichem/) using self-supervised causal language modeling, enabling the extraction of versatile and generalizable molecular representations.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
## Usage
|
| 7 |
+
The code for using this model is provided in this [GitHub repository](https://github.com/TheLuoFengLab/ChemFM).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|