|
|
--- |
|
|
language: |
|
|
- vi |
|
|
license: mit |
|
|
tags: |
|
|
- summarization |
|
|
- vietnamese |
|
|
- bartpho |
|
|
- seq2seq |
|
|
datasets: |
|
|
- news-dataset-vietnameses |
|
|
metrics: |
|
|
- rouge |
|
|
model-index: |
|
|
- name: bartpho-vietnamese-summarization |
|
|
results: |
|
|
- task: |
|
|
type: summarization |
|
|
dataset: |
|
|
name: Vietnamese News Dataset |
|
|
type: news-dataset-vietnameses |
|
|
metrics: |
|
|
- type: rouge |
|
|
value: TBD |
|
|
--- |
|
|
|
|
|
# BARTpho Vietnamese Summarization Model |
|
|
|
|
|
This model is a fine-tuned version of [vinai/bartpho-syllable](https://huggingface.co/vinai/bartpho-syllable) for Vietnamese text summarization. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: vinai/bartpho-syllable |
|
|
- **Task**: Text Summarization |
|
|
- **Language**: Vietnamese |
|
|
- **Training Dataset**: Vietnamese News Dataset |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import BartForConditionalGeneration, AutoTokenizer |
|
|
|
|
|
model_name = "YOUR_USERNAME/bartpho-vietnamese-summarization" |
|
|
# Use AutoTokenizer for BARTpho (automatically loads BartphoTokenizer) |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = BartForConditionalGeneration.from_pretrained(model_name) |
|
|
|
|
|
# Example usage |
|
|
text = "Your Vietnamese news article text here..." |
|
|
inputs = tokenizer(text, return_tensors="pt", max_length=1024, truncation=True) |
|
|
summary_ids = model.generate(inputs["input_ids"], max_length=128, num_beams=4, early_stopping=True) |
|
|
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
|
print(summary) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Training Framework**: Hugging Face Transformers |
|
|
- **GPU**: NVIDIA P100 16GB |
|
|
- **Batch Size**: 8 per device |
|
|
- **Gradient Accumulation**: 2 steps |
|
|
- **Learning Rate**: 2e-5 |
|
|
- **Epochs**: 3 |
|
|
|