Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,4 +7,51 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+
SALT UNIMAS - SARAWAK LANGUAGE TECHNOLOGY
|
| 11 |
+
=====================================
|
| 12 |
+
Faculty of Computer Science and Information Technology (FCSIT)
|
| 13 |
+
Universiti Malaysia Sarawak (UNIMAS)
|
| 14 |
+
|
| 15 |
+
ABOUT
|
| 16 |
+
This organization hosts speech and language datasets developed by the
|
| 17 |
+
Sarawak Language Technology (SaLT) research group at UNIMAS FCSIT.
|
| 18 |
+
Our focus is on low-resource languages and dialects spoken in Sarawak,
|
| 19 |
+
Malaysia — particularly Sarawak Malay and Iban.
|
| 20 |
+
|
| 21 |
+
DATASETS
|
| 22 |
+
|
| 23 |
+
1. sarawak-malay-asr
|
| 24 |
+
Language : Sarawak Malay (ms)
|
| 25 |
+
Utterances : 1,164
|
| 26 |
+
Duration : ~1.9 hours
|
| 27 |
+
Speakers : 42
|
| 28 |
+
Task : Automatic Speech Recognition
|
| 29 |
+
Link : https://huggingface.co/datasets/SaLTUNIMAS/sarawak-malay-asr
|
| 30 |
+
|
| 31 |
+
2. iban-speech
|
| 32 |
+
Language : Iban (iba)
|
| 33 |
+
Utterances : ~2,977
|
| 34 |
+
Duration : ~8 hours
|
| 35 |
+
Task : Automatic Speech Recognition
|
| 36 |
+
Link : https://huggingface.co/datasets/SaLTUNIMAS/iban-speech
|
| 37 |
+
Source : https://github.com/sarahjuan/iban
|
| 38 |
+
|
| 39 |
+
LANGUAGES COVERED
|
| 40 |
+
- Sarawak Malay (low-resource dialect, Sarawak, Malaysia)
|
| 41 |
+
- Iban (low-resource language, Sarawak, Malaysia)
|
| 42 |
+
|
| 43 |
+
QUICK START
|
| 44 |
+
pip install datasets
|
| 45 |
+
|
| 46 |
+
# Load Sarawak Malay dataset
|
| 47 |
+
from datasets import load_dataset
|
| 48 |
+
ds = load_dataset("SaLTUNIMAS/sarawak-malay-asr")
|
| 49 |
+
|
| 50 |
+
# Load Iban dataset
|
| 51 |
+
ds = load_dataset("SaLTUNIMAS/iban-speech")
|
| 52 |
+
|
| 53 |
+
CONTACT
|
| 54 |
+
Faculty of Computer Science and Information Technology (FCSIT)
|
| 55 |
+
Universiti Malaysia Sarawak (UNIMAS)
|
| 56 |
+
94300 Kota Samarahan, Sarawak, Malaysia
|
| 57 |
+
https://www.fcsit.unimas.my/
|