TDAMM Multi-Label Classification Model v2

The TDAMM (Time Domain Multi-Messenger Astronomy) model v2 is created to categorize NASA's time domain multi-messenger resources into one or more of 36 distinct categories identified by subject matter experts (SMEs).

This is an updated version fine-tuned from INDUS-SDE, a domain-adapted language model for Scientific Content Curation & Discovery in noisy context.

Model Description

Base Model: nasa-impact/indus-sde-v0.2, fine-tuned for multi-label classification
Architecture: RobertaForSequenceClassification
Task: Multi-label classification (36 categories)
Training Data: NASA and non-NASA documents related to TDAMM topics identified by SMEs (same data split as v1)

Changes from v1

New Base Model: Fine-tuned from INDUS-SDE v0.2 (previously astroBERT in v1)
Leverages domain-adapted embeddings from INDUS-SDE for improved understanding of scientific document entities

Performance Metrics

Metric	Value
Eval Accuracy	0.657
Weighted Precision (threshold=0.5)	0.854

Model Comparison

Model	Weighted Precision
ModernBERT-SDE	45.2
ModernBERT	72.5
INDUS	73.4
AstroBERT	85.5
INDUS-SDE	85.3

TDAMM classification performance (Weighted Precision). All models fine-tuned with focal loss. INDUS-SDE matches domain-specific AstroBERT despite no astrophysics-specific pretraining.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("nasa-impact/tdamm-classification-v2")
model = AutoModelForSequenceClassification.from_pretrained("nasa-impact/tdamm-classification-v2")

# Prepare input
text = "Your astronomical text here"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)

# Convert to binary predictions (threshold = 0.5)
binary_predictions = (predictions > 0.5).int()

# Get predicted label indices
predicted_indices = torch.where(binary_predictions[0] == 1)[0].tolist()
print(f"Predicted indices: {predicted_indices}")

Label Mapping During Inference

After obtaining predictions from the model, you can map the predicted label indices to their actual names using the model.config.id2label dictionary:

# Example usage
predicted_indices = [0, 2, 5]
predicted_labels = [model.config.id2label[idx] for idx in predicted_indices]
print(predicted_labels)

Related Models

TDAMM Classification v1 - Previous version based on astroBERT
INDUS-SDE v0.2 - Base model for this fine-tuned version

Citation

If you use this model, please cite:

@misc{tdamm-classification-v2,
  author = {NASA IMPACT},
  title = {TDAMM Multi-Label Classification Model v2},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/nasa-impact/tdamm-classification-v2}
}

License

Apache 2.0

Downloads last month: 4

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for nasa-impact/tdamm-classification-v2

Base model

nasa-impact/indus-sde-v0.2

Finetuned

(5)

this model