File size: 1,357 Bytes
56db97b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
language: en
license: mit
library_name: transformers
tags:
- bert
- scientific-text
- embeddings
- fine-tuned
pipeline_tag: feature-extraction
---

# scibert-citation-model

This model is a fine-tuned version of SciBERT specifically optimized for generating embeddings from scientific papers.

## Model Details

- **Base Model**: SciBERT (Scientific BERT)
- **Fine-tuning Task**: Scientific paper understanding and embedding generation
- **Language**: English (Scientific/Academic)
- **Vocabulary**: Scientific vocabulary

## Usage

```python
from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/scibert-citation-model")
model = AutoModel.from_pretrained("your-username/scibert-citation-model")

# Generate embeddings
text = "Your scientific text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :]  # [CLS] token embedding

print(f"Embeddings shape: {embeddings.shape}")
```

## Performance

Fine-tuned SciBert

## Training Details

- **Training Framework**: PyTorch/Transformers
- **Fine-tuning Objective**: Scientific text understanding

## Citation

If you use this model in your research, please cite appropriately.