DrLonee's picture
Upload folder using huggingface_hub
56db97b verified
---
language: en
license: mit
library_name: transformers
tags:
- bert
- scientific-text
- embeddings
- fine-tuned
pipeline_tag: feature-extraction
---
# scibert-citation-model
This model is a fine-tuned version of SciBERT specifically optimized for generating embeddings from scientific papers.
## Model Details
- **Base Model**: SciBERT (Scientific BERT)
- **Fine-tuning Task**: Scientific paper understanding and embedding generation
- **Language**: English (Scientific/Academic)
- **Vocabulary**: Scientific vocabulary
## Usage
```python
from transformers import AutoTokenizer, AutoModel
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/scibert-citation-model")
model = AutoModel.from_pretrained("your-username/scibert-citation-model")
# Generate embeddings
text = "Your scientific text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state[:, 0, :] # [CLS] token embedding
print(f"Embeddings shape: {embeddings.shape}")
```
## Performance
Fine-tuned SciBert
## Training Details
- **Training Framework**: PyTorch/Transformers
- **Fine-tuning Objective**: Scientific text understanding
## Citation
If you use this model in your research, please cite appropriately.