|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
Source code: [Google Colab](https://colab.research.google.com/drive/1qnocYiNrF3udkxx1YRwyxTSaeN7F35DK) |
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
Can do abstractive summarization of legal/contractual documents. Fine tuned on BART-LARGE-CNN. |
|
|
|
|
|
- **Developed by:** [Siddhesh Kulthe](https://huggingface.co/siddheshtv) |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** [Facebook/BART-LARGE-CNN](https://huggingface.co/facebook/bart-large-cnn) |
|
|
|
|
|
## Uses |
|
|
|
|
|
- Abstractive summarization for legal docs (Banking, Legal, Contractual, etc.) |
|
|
|
|
|
## Sample Usage |
|
|
|
|
|
Load model config and safetensors: |
|
|
```python |
|
|
from transformers import BartForConditionalGeneration, BartTokenizer |
|
|
import torch |
|
|
|
|
|
|
|
|
model_name = "siddheshtv/bart-multi-lexsum" |
|
|
|
|
|
model = BartForConditionalGeneration.from_pretrained(model_name) |
|
|
tokenizer = BartTokenizer.from_pretrained(model_name) |
|
|
|
|
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
|
model = model.to(device) |
|
|
``` |
|
|
|
|
|
Generate Summary Function |
|
|
```python |
|
|
def generate_summary(model, tokenizer, text, max_length=512): |
|
|
device = next(model.parameters()).device |
|
|
inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True) |
|
|
inputs = inputs.to(device) |
|
|
summary_ids = model.generate( |
|
|
inputs, |
|
|
max_length=max_length, |
|
|
min_length=40, |
|
|
length_penalty=2.0, |
|
|
num_beams=4, |
|
|
early_stopping=True, |
|
|
no_repeat_ngram_size=3, |
|
|
forced_bos_token_id=0, |
|
|
forced_eos_token_id=2 |
|
|
) |
|
|
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
|
return summary |
|
|
``` |
|
|
|
|
|
Generate summary |
|
|
```python |
|
|
generated_summary = generate_summary(model, tokenizer, example_text) |
|
|
print("Generated Summary:") |
|
|
print(generated_summary) |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- **Dataset URL:** [Multi-Lexsum](https://multilexsum.github.io/) |