Attention-GRU for Cross-Disciplinary Abstract Classification 🌿

This study has been accepted for publication in Scientific Reports and is currently in the publication process (in press)

A resource-efficient Attention-based Bidirectional GRU with frozen GloVe-300d embeddings, trained on the WOS-46985 benchmark to classify scientific abstracts into 134 fine-grained sub-disciplines (Web of Science Level-2).

The model achieves a Macro-F1 of 0.920, outperforming domain-specific Transformer baselines (BERT, BioBERT, SciBERT) while training in ~10 minutes instead of hours and consuming a fraction of the energy.


🧠 Model Description

Component Configuration
Architecture Bidirectional GRU + Soft Attention
Embeddings Frozen GloVe-300d (Stanford, 6B tokens)
Vocabulary size 14,541
GRU hidden dim 256
GRU layers 2 (bidirectional)
Classifier Linear (dropout 0.5)
Output classes 134 (WOS-46985 Level-2 sub-disciplines)
Trainable parameters ~1.06 M
Max sequence length 250 tokens

The architecture leverages the semantic stability of scientific terminology, sidestepping the quadratic cost of full Transformer attention while preserving long-range dependency modeling through soft-attention over GRU hidden states.


πŸ“Š Datasets

Dataset Abstracts Classes Used for
arXiv ~ 3 (AI, Economics, Psychology) Coarse-grained interdisciplinary baseline
WOS-11967 11,967 35 Mid-grained sub-disciplines (L2)
WOS-46985 (this checkpoint) 46,985 134 Fine-grained sub-disciplines (L2)

πŸš€ Performance

State-of-the-Art Comparison on Web of Science

Model WOS-11967 (35 classes) F1 WOS-46985 (134 classes) F1 Training Time
BERT-Base 0.903 0.850 ~ Hours
BioBERT 0.903 0.856 ~ Hours
SciBERT 0.921 0.867 ~ Hours
Attention-GRU (this model) 0.953 0.920 ~10 min

Efficiency Metrics (arXiv benchmark)

Model Val. Accuracy Parameters (M) Inference (ms) Energy (kWh)
Attention-GRU 96.8% 1.06 0.36 0.15
BERT (Base) 94.4% 109.5 7.22 0.50
RoBERTa 93.4% 125.0 7.80 0.52

The Attention-GRU is ~14Γ— faster to train and uses ~3Γ— less energy than Transformer baselines while achieving higher accuracy on fine-grained taxonomies.


πŸ“¦ Files

File Description
attention_gru_wos.pth PyTorch checkpoint with encoder_state_dict, classifier_state_dict, and hyperparameters
word2idx.json Vocabulary mapping (14,541 tokens)
labels.json Class-id β†’ discipline-name mapping (134 entries; replace placeholders with real names if needed)

πŸ› οΈ How to Use

import json, re, torch, torch.nn as nn
from huggingface_hub import hf_hub_download

# --- Model definitions (must match training) ---
class Attention(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.attention = nn.Linear(hidden_dim, 1, bias=False)
    def forward(self, rnn_outputs):
        w = torch.softmax(self.attention(rnn_outputs).squeeze(-1), dim=1)
        return torch.bmm(w.unsqueeze(1), rnn_outputs).squeeze(1)

class GRUAttentionEncoder(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_layers, bidirectional):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
        self.gru = nn.GRU(embed_dim, hidden_dim, num_layers=num_layers,
                          batch_first=True, bidirectional=bidirectional)
        self.attention = Attention(hidden_dim * (2 if bidirectional else 1))
    def forward(self, x):
        out, _ = self.gru(self.embedding(x))
        return self.attention(out)

class Classifier(nn.Module):
    def __init__(self, input_dim, num_classes, dropout=0.5):
        super().__init__()
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(input_dim, num_classes)
    def forward(self, x):
        return self.fc(self.dropout(x))

# --- Load files from this repo ---
REPO = "MAE07/attention-gru-model"
ckpt   = torch.load(hf_hub_download(REPO, "attention_gru_wos.pth"), map_location="cpu")
vocab  = json.load(open(hf_hub_download(REPO, "word2idx.json")))
labels = {int(k): v for k, v in json.load(open(hf_hub_download(REPO, "labels.json"))).items()}

hp = ckpt["hyperparameters"]
encoder = GRUAttentionEncoder(hp["vocab_size"], hp["embed_dim"], hp["hidden_dim"],
                              hp["num_layers"], hp["bidirectional"])
clf = Classifier(hp["hidden_dim"] * (2 if hp["bidirectional"] else 1),
                 hp["num_classes"], hp["fc_dropout"])
encoder.load_state_dict(ckpt["encoder_state_dict"])
clf.load_state_dict(ckpt["classifier_state_dict"])
encoder.eval(); clf.eval()

# --- Inference ---
def predict(text, max_len=250, top_k=5):
    ids = [vocab.get(t, vocab["<UNK>"]) for t in re.findall(r"\b\w+\b", text.lower())]
    ids = (ids + [0] * max_len)[:max_len]
    x = torch.tensor([ids], dtype=torch.long)
    with torch.no_grad():
        probs = torch.softmax(clf(encoder(x)), dim=1)[0]
    conf, idx = torch.topk(probs, k=top_k)
    return [(int(i), labels.get(int(i), f"Class {int(i)}"), float(c))
            for c, i in zip(conf, idx)]

abstract = "The exponential growth of scholarly literature necessitates automated systems."
for cid, name, conf in predict(abstract):
    print(f"{cid:3d}  {name:<25s}  {conf:.4f}")

A ready-to-use Gradio demo is available at: πŸ‘‰ https://huggingface.co/spaces/MAE07/abstract-submission


πŸ§ͺ Training Details

  • Optimizer: Adam, lr 8e-4, weight decay 1e-4
  • Scheduler: ReduceLROnPlateau (factor 0.5, patience 2) on validation accuracy
  • Loss: Class-weighted cross-entropy
  • Batch size: 64
  • Epochs: 20 (with best-model checkpointing on val accuracy)
  • Augmentation: WordNet-synonym replacement (2 substitutions per sample), 2Γ— data expansion
  • Split: 70 / 15 / 15 stratified train/val/test
  • Hardware: Single GPU (training completed in ~10 minutes)

⚠️ Limitations

  • Vocabulary is frozen at 14,541 GloVe-covered tokens β€” out-of-vocabulary scientific terms map to <UNK> and may degrade performance on highly specialized abstracts (e.g., niche biochemistry, novel CS subfields).
  • Trained on English abstracts only β€” non-English text is not supported.
  • The 134 class IDs follow the WOS-46985 Level-2 ordering used during training; if you need human-readable discipline names you must replace the placeholder values in labels.json with the official WOS sub-discipline names.
  • Domain bias: Web of Science indexes lean toward STEM and Anglophone publication venues. Abstracts from underrepresented humanities or non-Anglophone disciplines may be misclassified.
  • Soft-attention over GRU outputs is not a direct substitute for self-attention in tasks requiring deep token-token interaction (e.g., NLI, QA).

🌱 Environmental Impact

This model was specifically designed under the Green AI paradigm. Compared to Transformer baselines on the same task, it consumes ~3Γ— less energy during training and ~20Γ— less during inference, while achieving higher accuracy on fine-grained taxonomies. Training a single full run requires only minutes on commodity hardware and produces a checkpoint of just ~26 MB.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using MAE07/attention-gru-model 1