File size: 1,284 Bytes
993c739 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | # ChemBERTa IUPAC Classifier
This model is a fine-tuned version of [seyonec/ChemBERTa-zinc-base-v1](https://huggingface.co/seyonec/ChemBERTa-zinc-base-v1) for binary classification of chemical compounds based on their IUPAC names.
## Model description
This model uses ChemBERTa, a BERT-like model pre-trained on chemical structures, to classify molecules based on their IUPAC names. The model was fine-tuned on a custom dataset containing IUPAC names of molecules with binary labels.
**Developed by:** xluobd
**Model type:** RobertaForSequenceClassification
### How to use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("xluobd/chemberta-iupac-classifier")
model = AutoModelForSequenceClassification.from_pretrained("xluobd/chemberta-iupac-classifier")
# Example IUPAC name
iupac_name = "2-hydroxy-N,N,N-trimethylethan-1-aminium"
# Tokenize and predict
inputs = tokenizer(iupac_name, return_tensors="pt", padding=True, truncation=True, max_length=256)
outputs = model(**inputs)
probabilities = outputs.logits.softmax(dim=-1)
prediction = probabilities.argmax().item()
print(f"Prediction: {prediction}")
print(f"Confidence: {probabilities[0][prediction].item():.4f}") |