Summarization
Transformers
Safetensors
Ukrainian
English
t5
text2text-generation
text-generation-inference
O3ap-sm / README.md
d0p3's picture
Update README.md
f869369 verified
---
datasets:
- d0p3/ukr-pravda-news-summary
- d0p3/ukr-pravda-news-summary-v1.1
- shamotskyi/ukr_pravda_2y
language:
- uk
- en
pipeline_tag: summarization
license: cc-by-nc-4.0
---
# O3ap-sm: Ukrainian News Summarizer
This repository contains the 03ap-sm model, a Ukrainian news summarization model fine-tuned on the T5-small architecture. The model has been trained on the Ukrainian Corpus CCMatrix for text summarization tasks.
## Model Overview
* **Base Model:** T5-small
* **Training Dataset:** Ukrainian Corpus CCMatrix
* **Fine-tuning Task:** News article summarization
* **Fine-tuning Dataset:**
* [shamotskyi/ukr_pravda_2y](https://huggingface.co/datasets/shamotskyi/ukr_pravda_2y)
* [d0p3/ukr-pravda-news-summary](https://huggingface.co/datasets/d0p3/ukr-pravda-news-summary)
* [d0p3/ukr-pravda-news-summary-v1.0](https://huggingface.co/datasets/d0p3/ukr-pravda-news-summary-v1.1)
* **Language:** Ukrainian, English
## Usage
**Installation**
```bash
pip install transformers
```
**Loading the Model**
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("d0p3/O3ap-sm")
model = AutoModelForSeq2SeqLM.from_pretrained("d0p3/O3ap-sm")
```
**Generating Summaries**
```python
news_article = "**YOUR NEWS ARTICLE TEXT IN UKRAINIAN**"
input_ids = tokenizer(news_article, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)
summary = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(summary)
```
## Limitations
* The model may not perform optimally on informal or highly colloquial Ukrainian text.
* As with any language model, there's a possibility of generating factually incorrect summaries or summaries that reflect biases present in the training data.
## Ethical Considerations
* **Transparency:** Clearly state the model's intended use for summarizing news articles, and its limitations.
* **Bias:** Be aware of biases that may have been introduced during training data selection or the fine-tuning process. Employ mitigation strategies where possible.
* **Misuse:** Acknowledge the potential for misuse of the model, such as generating misleading summaries. Advise caution and critical evaluation of its outputs.
## Contributing
We welcome contributions and feedback!
## License
This model is released under the [CC-BY-NC-4.0].