xluobd commited on
Commit
993c739
·
verified ·
1 Parent(s): c7dca71

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ChemBERTa IUPAC Classifier
2
+
3
+ This model is a fine-tuned version of [seyonec/ChemBERTa-zinc-base-v1](https://huggingface.co/seyonec/ChemBERTa-zinc-base-v1) for binary classification of chemical compounds based on their IUPAC names.
4
+
5
+ ## Model description
6
+
7
+ This model uses ChemBERTa, a BERT-like model pre-trained on chemical structures, to classify molecules based on their IUPAC names. The model was fine-tuned on a custom dataset containing IUPAC names of molecules with binary labels.
8
+
9
+ **Developed by:** xluobd
10
+
11
+ **Model type:** RobertaForSequenceClassification
12
+
13
+ **Language:** Chemical IUPAC nomenclature
14
+
15
+ ### How to use
16
+
17
+ ```python
18
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
19
+
20
+ # Load model and tokenizer
21
+ tokenizer = AutoTokenizer.from_pretrained("xluobd/chemberta-iupac-classifier")
22
+ model = AutoModelForSequenceClassification.from_pretrained("xluobd/chemberta-iupac-classifier")
23
+
24
+ # Example IUPAC name
25
+ iupac_name = "2-hydroxy-N,N,N-trimethylethan-1-aminium"
26
+
27
+ # Tokenize and predict
28
+ inputs = tokenizer(iupac_name, return_tensors="pt", padding=True, truncation=True, max_length=256)
29
+ outputs = model(**inputs)
30
+ probabilities = outputs.logits.softmax(dim=-1)
31
+ prediction = probabilities.argmax().item()
32
+
33
+ print(f"Prediction: {prediction}")
34
+ print(f"Confidence: {probabilities[0][prediction].item():.4f}")