Update README.md
Browse files
README.md
CHANGED
|
@@ -17,15 +17,17 @@ widget:
|
|
| 17 |
[dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) (BERT Checkpoint)
|
| 18 |
|
| 19 |
## Dataset
|
|
|
|
|
|
|
| 20 |
[MLSUM tu/tr](https://huggingface.co/datasets/viewer/?dataset=mlsum)
|
| 21 |
|
| 22 |
## Results
|
| 23 |
|
| 24 |
|Set|Metric| Value|
|
| 25 |
|----|------|------|
|
| 26 |
-
| Test |Rouge2 - mid -precision | 32.41
|
| 27 |
-
| Test | Rouge2 - mid - recall | 28.65
|
| 28 |
-
| Test | Rouge2 - mid - fmeasure | 29.48
|
| 29 |
|
| 30 |
## Usage
|
| 31 |
|
|
@@ -45,7 +47,10 @@ def generate_summary(text):
|
|
| 45 |
output = model.generate(input_ids, attention_mask=attention_mask)
|
| 46 |
return tokenizer.decode(output[0], skip_special_tokens=True)
|
| 47 |
|
| 48 |
-
|
| 49 |
text = "Your text here..."
|
| 50 |
generate_summary(text)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
|
|
|
| 17 |
[dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) (BERT Checkpoint)
|
| 18 |
|
| 19 |
## Dataset
|
| 20 |
+
**MLSUM** is the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, **Turkish**. Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on state-of-the-art systems. These highlight existing biases which motivate the use of a multi-lingual dataset.
|
| 21 |
+
|
| 22 |
[MLSUM tu/tr](https://huggingface.co/datasets/viewer/?dataset=mlsum)
|
| 23 |
|
| 24 |
## Results
|
| 25 |
|
| 26 |
|Set|Metric| Value|
|
| 27 |
|----|------|------|
|
| 28 |
+
| Test |Rouge2 - mid -precision | **32.41**|
|
| 29 |
+
| Test | Rouge2 - mid - recall | **28.65**|
|
| 30 |
+
| Test | Rouge2 - mid - fmeasure | **29.48**|
|
| 31 |
|
| 32 |
## Usage
|
| 33 |
|
|
|
|
| 47 |
output = model.generate(input_ids, attention_mask=attention_mask)
|
| 48 |
return tokenizer.decode(output[0], skip_special_tokens=True)
|
| 49 |
|
|
|
|
| 50 |
text = "Your text here..."
|
| 51 |
generate_summary(text)
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) with the support of [Narrativa](https://www.narrativa.com/)
|
| 55 |
+
> Made with <span style="color: #e25555;">♥</span> in Spain
|
| 56 |
|