scibert-citation-model
This model is a fine-tuned version of SciBERT specifically optimized for generating embeddings from scientific papers.
Model Details
- Base Model: SciBERT (Scientific BERT)
- Fine-tuning Task: Scientific paper understanding and embedding generation
- Language: English (Scientific/Academic)
- Vocabulary: Scientific vocabulary
Usage
from transformers import AutoTokenizer, AutoModel
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/scibert-citation-model")
model = AutoModel.from_pretrained("your-username/scibert-citation-model")
# Generate embeddings
text = "Your scientific text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state[:, 0, :] # [CLS] token embedding
print(f"Embeddings shape: {embeddings.shape}")
Performance
Fine-tuned SciBert
Training Details
- Training Framework: PyTorch/Transformers
- Fine-tuning Objective: Scientific text understanding
Citation
If you use this model in your research, please cite appropriately.
- Downloads last month
- 8