|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- scibert |
|
|
- concept-annotation |
|
|
- nlp |
|
|
- sequence-classification |
|
|
|
|
|
metrics: |
|
|
- accuracy |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# SciBERT Concept Annotation |
|
|
|
|
|
This model is a fine-tuned version of SciBERT for **Concept Annotation**. It classifies the relationship between a document text and a specific concept/term using sequence classification. |
|
|
|
|
|
## Model Description |
|
|
- **Model type:** SciBERT (BERT-based) |
|
|
- **Language(s):** English |
|
|
- **License:** Apache 2.0 |
|
|
- **Fine-tuned from model:** `allenai/scibert_scivocab_uncased` |
|
|
|
|
|
## Usage |
|
|
|
|
|
You can use this model directly with a custom inference script. Note that while the model weights are hosted here, it is designed to work with the `allenai/scibert_scivocab_uncased` tokenizer. |
|
|
|
|
|
### Example Code |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_id = "linh101201/scibert-concept-annotation" |
|
|
tokenizer_id = "allenai/scibert_scivocab_uncased" |
|
|
|
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_id, num_labels=2).to("cuda") |
|
|
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id) |
|
|
|
|
|
# Example inputs: Document text and the Concept to annotate |
|
|
text = "Large Language Model in Law Documents Hub" |
|
|
concept = "natural language processing" |
|
|
|
|
|
inputs = tokenizer(text, concept, return_tensors="pt").to("cuda") |
|
|
|
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
# Apply softmax to get probabilities |
|
|
probs = torch.nn.functional.softmax(logits, dim=-1) |
|
|
print(f"Logits: {logits}") |
|
|
print(f"Probabilities: {probs}") |