| | --- |
| | datasets: |
| | - d0p3/ukr-pravda-news-summary |
| | - d0p3/ukr-pravda-news-summary-v1.1 |
| | - shamotskyi/ukr_pravda_2y |
| | language: |
| | - uk |
| | - en |
| | pipeline_tag: summarization |
| | license: cc-by-nc-4.0 |
| | --- |
| | |
| | # O3ap-sm: Ukrainian News Summarizer |
| |
|
| | This repository contains the 03ap-sm model, a Ukrainian news summarization model fine-tuned on the T5-small architecture. The model has been trained on the Ukrainian Corpus CCMatrix for text summarization tasks. |
| |
|
| | ## Model Overview |
| |
|
| | * **Base Model:** T5-small |
| | * **Training Dataset:** Ukrainian Corpus CCMatrix |
| | * **Fine-tuning Task:** News article summarization |
| | * **Fine-tuning Dataset:** |
| | * [shamotskyi/ukr_pravda_2y](https://huggingface.co/datasets/shamotskyi/ukr_pravda_2y) |
| | * [d0p3/ukr-pravda-news-summary](https://huggingface.co/datasets/d0p3/ukr-pravda-news-summary) |
| | * [d0p3/ukr-pravda-news-summary-v1.0](https://huggingface.co/datasets/d0p3/ukr-pravda-news-summary-v1.1) |
| | * **Language:** Ukrainian, English |
| |
|
| | ## Usage |
| |
|
| | **Installation** |
| |
|
| | ```bash |
| | pip install transformers |
| | ``` |
| |
|
| | **Loading the Model** |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("d0p3/O3ap-sm") |
| | model = AutoModelForSeq2SeqLM.from_pretrained("d0p3/O3ap-sm") |
| | ``` |
| |
|
| | **Generating Summaries** |
| |
|
| | ```python |
| | news_article = "**YOUR NEWS ARTICLE TEXT IN UKRAINIAN**" |
| | |
| | input_ids = tokenizer(news_article, return_tensors="pt").input_ids |
| | output_ids = model.generate(input_ids) |
| | |
| | summary = tokenizer.decode(output_ids[0], skip_special_tokens=True) |
| | |
| | print(summary) |
| | ``` |
| |
|
| | ## Limitations |
| |
|
| | * The model may not perform optimally on informal or highly colloquial Ukrainian text. |
| | * As with any language model, there's a possibility of generating factually incorrect summaries or summaries that reflect biases present in the training data. |
| |
|
| | ## Ethical Considerations |
| |
|
| | * **Transparency:** Clearly state the model's intended use for summarizing news articles, and its limitations. |
| | * **Bias:** Be aware of biases that may have been introduced during training data selection or the fine-tuning process. Employ mitigation strategies where possible. |
| | * **Misuse:** Acknowledge the potential for misuse of the model, such as generating misleading summaries. Advise caution and critical evaluation of its outputs. |
| |
|
| | ## Contributing |
| |
|
| | We welcome contributions and feedback! |
| |
|
| | ## License |
| |
|
| | This model is released under the [CC-BY-NC-4.0]. |