TDAMM Multi-Label Classification Model v2
The TDAMM (Time Domain Multi-Messenger Astronomy) model v2 is created to categorize NASA's time domain multi-messenger resources into one or more of 36 distinct categories identified by subject matter experts (SMEs).
This is an updated version fine-tuned from INDUS-SDE, a domain-adapted language model for Scientific Content Curation & Discovery in noisy context.
Model Description
- Base Model: nasa-impact/indus-sde-v0.2, fine-tuned for multi-label classification
- Architecture: RobertaForSequenceClassification
- Task: Multi-label classification (36 categories)
- Training Data: NASA and non-NASA documents related to TDAMM topics identified by SMEs (same data split as v1)
Changes from v1
- New Base Model: Fine-tuned from INDUS-SDE v0.2 (previously astroBERT in v1)
- Leverages domain-adapted embeddings from INDUS-SDE for improved understanding of scientific document entities
Performance Metrics
| Metric | Value |
|---|---|
| Eval Accuracy | 0.657 |
| Weighted Precision (threshold=0.5) | 0.854 |
Model Comparison
| Model | Weighted Precision |
|---|---|
| ModernBERT-SDE | 45.2 |
| ModernBERT | 72.5 |
| INDUS | 73.4 |
| AstroBERT | 85.5 |
| INDUS-SDE | 85.3 |
TDAMM classification performance (Weighted Precision). All models fine-tuned with focal loss. INDUS-SDE matches domain-specific AstroBERT despite no astrophysics-specific pretraining.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("nasa-impact/tdamm-classification-v2")
model = AutoModelForSequenceClassification.from_pretrained("nasa-impact/tdamm-classification-v2")
# Prepare input
text = "Your astronomical text here"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.sigmoid(outputs.logits)
# Convert to binary predictions (threshold = 0.5)
binary_predictions = (predictions > 0.5).int()
# Get predicted label indices
predicted_indices = torch.where(binary_predictions[0] == 1)[0].tolist()
print(f"Predicted indices: {predicted_indices}")
Label Mapping During Inference
After obtaining predictions from the model, you can map the predicted label indices to their actual names using the model.config.id2label dictionary:
# Example usage
predicted_indices = [0, 2, 5]
predicted_labels = [model.config.id2label[idx] for idx in predicted_indices]
print(predicted_labels)
Related Models
- TDAMM Classification v1 - Previous version based on astroBERT
- INDUS-SDE v0.2 - Base model for this fine-tuned version
Citation
If you use this model, please cite:
@misc{tdamm-classification-v2,
author = {NASA IMPACT},
title = {TDAMM Multi-Label Classification Model v2},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/nasa-impact/tdamm-classification-v2}
}
License
Apache 2.0
- Downloads last month
- 8
Model tree for nasa-impact/tdamm-classification-v2
Base model
nasa-impact/indus-sde-v0.2