| # ChemBERTa IUPAC Classifier |
|
|
| This model is a fine-tuned version of [seyonec/ChemBERTa-zinc-base-v1](https://huggingface.co/seyonec/ChemBERTa-zinc-base-v1) for binary classification of chemical compounds based on their IUPAC names. |
|
|
| ## Model description |
|
|
| This model uses ChemBERTa, a BERT-like model pre-trained on chemical structures, to classify molecules based on their IUPAC names. The model was fine-tuned on a custom dataset containing IUPAC names of molecules with binary labels. |
|
|
| **Developed by:** xluobd |
|
|
| **Model type:** RobertaForSequenceClassification |
|
|
| ### How to use |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| |
| # Load model and tokenizer |
| tokenizer = AutoTokenizer.from_pretrained("xluobd/chemberta-iupac-classifier") |
| model = AutoModelForSequenceClassification.from_pretrained("xluobd/chemberta-iupac-classifier") |
| |
| # Example IUPAC name |
| iupac_name = "2-hydroxy-N,N,N-trimethylethan-1-aminium" |
| |
| # Tokenize and predict |
| inputs = tokenizer(iupac_name, return_tensors="pt", padding=True, truncation=True, max_length=256) |
| outputs = model(**inputs) |
| probabilities = outputs.logits.softmax(dim=-1) |
| prediction = probabilities.argmax().item() |
| |
| print(f"Prediction: {prediction}") |
| print(f"Confidence: {probabilities[0][prediction].item():.4f}") |