IbukunSanni commited on
Commit
c827ced
·
verified ·
1 Parent(s): 8c1ab64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -2
README.md CHANGED
@@ -6,6 +6,127 @@ colorTo: green
6
  sdk: static
7
  pinned: false
8
  ---
9
- Testing Readme change
 
 
10
 
11
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  sdk: static
7
  pinned: false
8
  ---
9
+ **Dataset Source**:
10
+ - Original Source: The English sentences were sourced from https://www.gutenberg.org/ .
11
+ - Translation Tool: Google Translate was used for translating the sentences from English to Yoruba.
12
 
13
+ **Dataset Format**:
14
+ - english: The original English sentence.
15
+ - yoruba: The Yoruba translation of the sentence.
16
+ - source: the source of the English sentences.
17
+ ## Example:
18
+ |en |yo |source|
19
+ |-----|----------------------------|--------------------------|
20
+ |The subconscious offensiveness of their attitude has constituted old Jolyon's 'home' the psychological moment of the family history, made it the prelude of their drama.| Iwa ibinu èrońgbà ti iṣesi wọn ti jẹ “ile” atijọ ti Jolyon ni akoko imọ-jinlẹ ti itan-akọọlẹ ẹbi, jẹ ki o jẹ iṣaaju ti eré wọn. https://www.gutenberg.org/ebooks/2559.txt.utf-8|
21
+ |The Forsytes were resentful of something, not individually, but as a family; this resentment expressed itself in an added perfection of raiment, an exuberance of family cordiality, an exaggeration of family importance, and--the sniff.| Awọn Forsytes binu si nkan kan, kii ṣe olukuluku, ṣugbọn gẹgẹbi idile; ibinu yii ṣe afihan ararẹ ni pipe ti aṣọ ti a fi kun, igbadun ti ifarabalẹ idile, iṣaju ti pataki idile, ati --ifun.| https://www.gutenberg.org/ebooks/2559.txt.utf-8 |
22
+ |Danger--so indispensable in bringing out the fundamental quality of any society, group, or individual--was what the Forsytes scented; the premonition of danger put a burnish on their armour.| Ewu - nitorinaa ko ṣe pataki lati mu didara ipilẹ ti awujọ, ẹgbẹ, tabi ẹni kọọkan jade - jẹ ohun ti awọn Forsytes rùn; premonition ti ewu fi kan iná lori wọn ihamọra.| https://www.gutenberg.org/ebooks/2559.txt.utf-8|
23
+
24
+ **Dataset Size**:
25
+ - Number of Entries:
26
+ - File Size:
27
+ **Usage**:
28
+ This dataset can be used for:
29
+ - Training machine translation models for Yoruba.
30
+ - Analyzing translation quality and limitations in automated tools.
31
+ - Supporting linguistic research and NLP projects for low-resource languages.
32
+
33
+ **Limitations and Considerations**:
34
+ - Quality of Translations: As translations were generated using Google Translate, some sentences may not reflect perfect accuracy. Manual validation is recommended for critical applications.
35
+ - Cultural and Contextual Nuances: Machine translations might miss idiomatic expressions or cultural nuances present in the source language.
36
+ - Biases: Any biases inherent in Google Translate's model may propagate into this dataset.
37
+
38
+ Licensing:
39
+ Source Material License: [Specify the license of the original English sentences, if applicable.]
40
+ Translated Dataset License: [Specify the license for your dataset, e.g., Creative Commons Attribution 4.0 (CC BY 4.0).]
41
+
42
+ ## tags:
43
+
44
+ - machine-translation
45
+
46
+ - speech-to-text
47
+
48
+ - yoruba-language
49
+
50
+ - african-languages
51
+
52
+ ## task_categories:
53
+
54
+ - text-classification
55
+
56
+ - machine-translation
57
+
58
+ ---
59
+
60
+
61
+
62
+ # Dataset Card for [Dataset Name]
63
+
64
+
65
+
66
+ ## Dataset Summary
67
+
68
+
69
+
70
+ [Brief description of the dataset, including its purpose and key features. For example:
71
+
72
+ "This dataset contains bilingual pairs of Yoruba and English sentences for tasks such as machine translation, text classification, and language modeling. The dataset is designed to address the lack of resources for African languages in NLP."]
73
+
74
+
75
+
76
+ ## Supported Tasks and Applications
77
+
78
+
79
+
80
+ ### Tasks
81
+
82
+
83
+
84
+ [List supported tasks and examples, e.g.:]
85
+
86
+ - **Machine Translation:** Translating Yoruba to English and vice versa.
87
+
88
+ - **Language Modeling:** Building and evaluating language models for Yoruba.
89
+
90
+
91
+
92
+ ### Applications
93
+
94
+
95
+
96
+ [List practical applications of the dataset, e.g., translation tools, chatbots, etc.]
97
+
98
+
99
+
100
+ ## Languages
101
+
102
+
103
+
104
+ This dataset includes data in:
105
+
106
+ - **Yoruba**: A tonal language spoken by over 45 million people in Nigeria and West Africa.
107
+
108
+ - **English**: Standard English translations of the Yoruba sentences.
109
+
110
+
111
+
112
+ ## Dataset Structure
113
+
114
+
115
+
116
+ ### Data Instances
117
+
118
+
119
+
120
+ Each instance in the dataset is represented as a pair of Yoruba and English sentences. An example instance is as follows:
121
+
122
+
123
+
124
+ ```json
125
+
126
+ {
127
+
128
+ "yoruba": "Ẹ kaaro.",
129
+
130
+ "english": "Good morning."
131
+
132
+ }