Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,127 @@ colorTo: green
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
-
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
+
**Dataset Source**:
|
| 10 |
+
- Original Source: The English sentences were sourced from https://www.gutenberg.org/ .
|
| 11 |
+
- Translation Tool: Google Translate was used for translating the sentences from English to Yoruba.
|
| 12 |
|
| 13 |
+
**Dataset Format**:
|
| 14 |
+
- english: The original English sentence.
|
| 15 |
+
- yoruba: The Yoruba translation of the sentence.
|
| 16 |
+
- source: the source of the English sentences.
|
| 17 |
+
## Example:
|
| 18 |
+
|en |yo |source|
|
| 19 |
+
|-----|----------------------------|--------------------------|
|
| 20 |
+
|The subconscious offensiveness of their attitude has constituted old Jolyon's 'home' the psychological moment of the family history, made it the prelude of their drama.| Iwa ibinu èroÅ„gbà ti iá¹£esi wá»n ti jẹ “ile†atijá» ti Jolyon ni akoko imá»-jinlẹ ti itan-aká»á»lẹ ẹbi, jẹ ki o jẹ iá¹£aaju ti eré wá»n. https://www.gutenberg.org/ebooks/2559.txt.utf-8|
|
| 21 |
+
|The Forsytes were resentful of something, not individually, but as a family; this resentment expressed itself in an added perfection of raiment, an exuberance of family cordiality, an exaggeration of family importance, and--the sniff.| Awá»n Forsytes binu si nkan kan, kii á¹£e olukuluku, á¹£ugbá»n gẹgẹbi idile; ibinu yii á¹£e afihan ararẹ ni pipe ti aṣỠti a fi kun, igbadun ti ifarabalẹ idile, iá¹£aju ti pataki idile, ati --ifun.| https://www.gutenberg.org/ebooks/2559.txt.utf-8 |
|
| 22 |
+
|Danger--so indispensable in bringing out the fundamental quality of any society, group, or individual--was what the Forsytes scented; the premonition of danger put a burnish on their armour.| Ewu - nitorinaa ko á¹£e pataki lati mu didara ipilẹ ti awujá», ẹgbẹ, tabi ẹni ká»á»kan jade - jẹ ohun ti awá»n Forsytes rùn; premonition ti ewu fi kan iná lori wá»n ihamá»ra.| https://www.gutenberg.org/ebooks/2559.txt.utf-8|
|
| 23 |
+
|
| 24 |
+
**Dataset Size**:
|
| 25 |
+
- Number of Entries:
|
| 26 |
+
- File Size:
|
| 27 |
+
**Usage**:
|
| 28 |
+
This dataset can be used for:
|
| 29 |
+
- Training machine translation models for Yoruba.
|
| 30 |
+
- Analyzing translation quality and limitations in automated tools.
|
| 31 |
+
- Supporting linguistic research and NLP projects for low-resource languages.
|
| 32 |
+
|
| 33 |
+
**Limitations and Considerations**:
|
| 34 |
+
- Quality of Translations: As translations were generated using Google Translate, some sentences may not reflect perfect accuracy. Manual validation is recommended for critical applications.
|
| 35 |
+
- Cultural and Contextual Nuances: Machine translations might miss idiomatic expressions or cultural nuances present in the source language.
|
| 36 |
+
- Biases: Any biases inherent in Google Translate's model may propagate into this dataset.
|
| 37 |
+
|
| 38 |
+
Licensing:
|
| 39 |
+
Source Material License: [Specify the license of the original English sentences, if applicable.]
|
| 40 |
+
Translated Dataset License: [Specify the license for your dataset, e.g., Creative Commons Attribution 4.0 (CC BY 4.0).]
|
| 41 |
+
|
| 42 |
+
## tags:
|
| 43 |
+
|
| 44 |
+
- machine-translation
|
| 45 |
+
|
| 46 |
+
- speech-to-text
|
| 47 |
+
|
| 48 |
+
- yoruba-language
|
| 49 |
+
|
| 50 |
+
- african-languages
|
| 51 |
+
|
| 52 |
+
## task_categories:
|
| 53 |
+
|
| 54 |
+
- text-classification
|
| 55 |
+
|
| 56 |
+
- machine-translation
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
# Dataset Card for [Dataset Name]
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
## Dataset Summary
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
[Brief description of the dataset, including its purpose and key features. For example:
|
| 71 |
+
|
| 72 |
+
"This dataset contains bilingual pairs of Yoruba and English sentences for tasks such as machine translation, text classification, and language modeling. The dataset is designed to address the lack of resources for African languages in NLP."]
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
## Supported Tasks and Applications
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
### Tasks
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
[List supported tasks and examples, e.g.:]
|
| 85 |
+
|
| 86 |
+
- **Machine Translation:** Translating Yoruba to English and vice versa.
|
| 87 |
+
|
| 88 |
+
- **Language Modeling:** Building and evaluating language models for Yoruba.
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
### Applications
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
[List practical applications of the dataset, e.g., translation tools, chatbots, etc.]
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
## Languages
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
This dataset includes data in:
|
| 105 |
+
|
| 106 |
+
- **Yoruba**: A tonal language spoken by over 45 million people in Nigeria and West Africa.
|
| 107 |
+
|
| 108 |
+
- **English**: Standard English translations of the Yoruba sentences.
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
## Dataset Structure
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
### Data Instances
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
Each instance in the dataset is represented as a pair of Yoruba and English sentences. An example instance is as follows:
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
```json
|
| 125 |
+
|
| 126 |
+
{
|
| 127 |
+
|
| 128 |
+
"yoruba": "Ẹ kaaro.",
|
| 129 |
+
|
| 130 |
+
"english": "Good morning."
|
| 131 |
+
|
| 132 |
+
}
|