File size: 1,633 Bytes
f284577
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4e1e67
 
 
f284577
 
 
 
 
 
 
 
 
 
b4e1e67
f284577
b4e1e67
 
 
f284577
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
language: en
license: apache-2.0
library_name: transformers
tags:
- scibert
- concept-annotation
- nlp
- sequence-classification

metrics:
- accuracy
pipeline_tag: text-classification
---

# SciBERT Concept Annotation

This model is a fine-tuned version of SciBERT for **Concept Annotation**. It classifies the relationship between a document text and a specific concept/term using sequence classification.

## Model Description
- **Model type:** SciBERT (BERT-based)
- **Language(s):** English
- **License:** Apache 2.0
- **Fine-tuned from model:** `allenai/scibert_scivocab_uncased`

## Usage

You can use this model directly with a custom inference script. Note that while the model weights are hosted here, it is designed to work with the `allenai/scibert_scivocab_uncased` tokenizer.

### Example Code

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch 

# Load model and tokenizer
model_id = "linh101201/scibert-concept-annotation"
tokenizer_id = "allenai/scibert_scivocab_uncased"

model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=2).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id)

# Example inputs: Document text and the Concept to annotate
text = "Large Language Model in Law Documents Hub"
concept = "natural language processing"

inputs = tokenizer(text, concept, return_tensors="pt").to("cuda")

with torch.no_grad():
    logits = model(**inputs).logits
    # Apply softmax to get probabilities
    probs = torch.nn.functional.softmax(logits, dim=-1)
    print(f"Logits: {logits}")
    print(f"Probabilities: {probs}")