TDAMM Multi-Label Classification Model v2

The TDAMM (Time Domain Multi-Messenger Astronomy) model v2 is created to categorize NASA's time domain multi-messenger resources into one or more of 36 distinct categories identified by subject matter experts (SMEs).

This is an updated version fine-tuned from INDUS-SDE, a domain-adapted language model for Scientific Content Curation & Discovery in noisy context.

Model Description

  • Base Model: nasa-impact/indus-sde-v0.2, fine-tuned for multi-label classification
  • Architecture: RobertaForSequenceClassification
  • Task: Multi-label classification (36 categories)
  • Training Data: NASA and non-NASA documents related to TDAMM topics identified by SMEs (same data split as v1)

Changes from v1

  • New Base Model: Fine-tuned from INDUS-SDE v0.2 (previously astroBERT in v1)
  • Leverages domain-adapted embeddings from INDUS-SDE for improved understanding of scientific document entities

Performance Metrics

Metric Value
Eval Accuracy 0.657
Weighted Precision (threshold=0.5) 0.854

Model Comparison

Model Weighted Precision
ModernBERT-SDE 45.2
ModernBERT 72.5
INDUS 73.4
AstroBERT 85.5
INDUS-SDE 85.3

TDAMM classification performance (Weighted Precision). All models fine-tuned with focal loss. INDUS-SDE matches domain-specific AstroBERT despite no astrophysics-specific pretraining.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("nasa-impact/tdamm-classification-v2")
model = AutoModelForSequenceClassification.from_pretrained("nasa-impact/tdamm-classification-v2")

# Prepare input
text = "Your astronomical text here"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)

# Convert to binary predictions (threshold = 0.5)
binary_predictions = (predictions > 0.5).int()

# Get predicted label indices
predicted_indices = torch.where(binary_predictions[0] == 1)[0].tolist()
print(f"Predicted indices: {predicted_indices}")

Label Mapping During Inference

After obtaining predictions from the model, you can map the predicted label indices to their actual names using the model.config.id2label dictionary:

# Example usage
predicted_indices = [0, 2, 5]
predicted_labels = [model.config.id2label[idx] for idx in predicted_indices]
print(predicted_labels)

Related Models

Citation

If you use this model, please cite:

@misc{tdamm-classification-v2,
  author = {NASA IMPACT},
  title = {TDAMM Multi-Label Classification Model v2},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/nasa-impact/tdamm-classification-v2}
}

License

Apache 2.0

Downloads last month
8
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nasa-impact/tdamm-classification-v2

Finetuned
(2)
this model