astroNLPy-ner

Named entity recognition for astronomical observation reports (ATels, GCN Circulars, TNS reports). Fine-tuned from adsabs/astroBERT on the TDAC corpus (Time-Domain Astronomy Corpus) with 27 astrophysical entity types.

This model is the NER component of the astroNLPy package, which also provides LLM-based coreference resolution and relation extraction for celestial objects.

Usage

With the astroNLPy package (recommended):

from astroNLPy.ner import NERModel

ner = NERModel.from_pretrained("atillaalkan/astroNLPy-ner")
tags = ner.predict_text("Swift observed GRS 1747-312 in the X-ray band.")

Or directly with transformers:

from transformers import pipeline

nlp = pipeline("token-classification", model="atillaalkan/astroNLPy-ner",
               aggregation_strategy="simple")
print(nlp("We report a nova in M31 at R = 19.7 mag."))

Entity types

CelestialObject, CelestialRegion, CelestialObjectRegion, Telescope, Observatory, Instrument, Survey, Wavelength, Formula, ObservationalTechniques, Citation, Dataset, Database, Archive, Software, URL, Person, Organization, Collaboration, Location, Grant, Proposal, Event, Model, Identifier, Tag, TextGarbage.

Results (v0.1.0)

Single 80/10/10 holdout split (8 test documents), seqeval / IOB2:

Metric Value
Micro F1 0.52
CelestialObject F1 0.96
Person F1 0.92
Wavelength F1 0.68

Single-split result; the micro-average is depressed by entity types absent from the small test set.

Training

  • Base model: adsabs/astroBERT
  • 10 epochs, batch size 8, learning rate 2e-5, IOB2 token classification
  • Corpus: TDAC (74 documents, ~19k tokens)

Citation

Publication will come soon.

License

MIT

Downloads last month
27
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for atillaalkan/astroNLPy-ner

Base model

adsabs/astroBERT
Finetuned
(2)
this model