xluobd
/

chemberta-iupac-classifier

Model card Files Files and versions

Metrics Training metrics Community

xluobd commited on Apr 25, 2025

Commit

993c739

·

verified ·

1 Parent(s): c7dca71

Create README.md

Files changed (1) hide show

README.md +34 -0

README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+# ChemBERTa IUPAC Classifier
+This model is a fine-tuned version of [seyonec/ChemBERTa-zinc-base-v1](https://huggingface.co/seyonec/ChemBERTa-zinc-base-v1) for binary classification of chemical compounds based on their IUPAC names.
+## Model description
+This model uses ChemBERTa, a BERT-like model pre-trained on chemical structures, to classify molecules based on their IUPAC names. The model was fine-tuned on a custom dataset containing IUPAC names of molecules with binary labels.
+**Developed by:** xluobd
+**Model type:** RobertaForSequenceClassification
+**Language:** Chemical IUPAC nomenclature
+### How to use
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("xluobd/chemberta-iupac-classifier")
+model = AutoModelForSequenceClassification.from_pretrained("xluobd/chemberta-iupac-classifier")
+# Example IUPAC name
+iupac_name = "2-hydroxy-N,N,N-trimethylethan-1-aminium"
+# Tokenize and predict
+inputs = tokenizer(iupac_name, return_tensors="pt", padding=True, truncation=True, max_length=256)
+outputs = model(**inputs)
+probabilities = outputs.logits.softmax(dim=-1)
+prediction = probabilities.argmax().item()
+print(f"Prediction: {prediction}")
+print(f"Confidence: {probabilities[0][prediction].item():.4f}")