scibert-citation-model

This model is a fine-tuned version of SciBERT specifically optimized for generating embeddings from scientific papers.

Model Details

Base Model: SciBERT (Scientific BERT)
Fine-tuning Task: Scientific paper understanding and embedding generation
Language: English (Scientific/Academic)
Vocabulary: Scientific vocabulary

Usage

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/scibert-citation-model")
model = AutoModel.from_pretrained("your-username/scibert-citation-model")

# Generate embeddings
text = "Your scientific text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :]  # [CLS] token embedding

print(f"Embeddings shape: {embeddings.shape}")

Performance

Fine-tuned SciBert

Training Details

Training Framework: PyTorch/Transformers
Fine-tuning Objective: Scientific text understanding

Citation

If you use this model in your research, please cite appropriately.

Downloads last month: 4