xluobd
/

chemberta-iupac-classifier

Model card Files Files and versions

Metrics Training metrics Community

chemberta-iupac-classifier / README.md

xluobd's picture

Update README.md

532a941 verified 11 months ago

|

history blame contribute delete

1.28 kB

	# ChemBERTa IUPAC Classifier

	This model is a fine-tuned version of [seyonec/ChemBERTa-zinc-base-v1](https://huggingface.co/seyonec/ChemBERTa-zinc-base-v1) for binary classification of chemical compounds based on their IUPAC names.

	## Model description

	This model uses ChemBERTa, a BERT-like model pre-trained on chemical structures, to classify molecules based on their IUPAC names. The model was fine-tuned on a custom dataset containing IUPAC names of molecules with binary labels.

	Developed by: xluobd

	Model type: RobertaForSequenceClassification

	### How to use

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("xluobd/chemberta-iupac-classifier")
	model = AutoModelForSequenceClassification.from_pretrained("xluobd/chemberta-iupac-classifier")

	# Example IUPAC name
	iupac_name = "2-hydroxy-N,N,N-trimethylethan-1-aminium"

	# Tokenize and predict
	inputs = tokenizer(iupac_name, return_tensors="pt", padding=True, truncation=True, max_length=256)
	outputs = model(**inputs)
	probabilities = outputs.logits.softmax(dim=-1)
	prediction = probabilities.argmax().item()

	print(f"Prediction: {prediction}")
	print(f"Confidence: {probabilities[0][prediction].item():.4f}")