Spaces:
Running
Running
| title: README | |
| emoji: 🐨 | |
| colorFrom: blue | |
| colorTo: blue | |
| sdk: static | |
| pinned: false | |
| SaLT UNIMAS - SARAWAK LANGUAGE TECHNOLOGY | |
| ===================================== | |
| Faculty of Computer Science and Information Technology (FCSIT) | |
| Universiti Malaysia Sarawak (UNIMAS) | |
| ABOUT | |
| This organization hosts speech and language datasets developed by the | |
| Sarawak Language Technology (SaLT) research group at UNIMAS FCSIT. | |
| Our focus is on low-resource languages and dialects spoken in Sarawak, | |
| Malaysia — particularly Sarawak Malay and Iban. | |
| DATASETS | |
| 1. sarawak-malay-asr | |
| Language : Sarawak Malay (ms) | |
| Utterances : 1,164 | |
| Duration : ~1.9 hours | |
| Speakers : 42 | |
| Task : Automatic Speech Recognition | |
| Link : https://huggingface.co/datasets/SaLTUNIMAS/sarawak-malay-asr | |
| 2. iban-speech | |
| Language : Iban (iba) | |
| Utterances : ~2,977 | |
| Duration : ~8 hours | |
| Task : Automatic Speech Recognition | |
| Link : https://huggingface.co/datasets/SaLTUNIMAS/iban-speech | |
| Source : https://github.com/sarahjuan/iban | |
| LANGUAGES COVERED | |
| - Sarawak Malay (low-resource dialect, Sarawak, Malaysia) | |
| - Iban (low-resource language, Sarawak, Malaysia) | |
| QUICK START | |
| pip install datasets | |
| # Load Sarawak Malay dataset | |
| from datasets import load_dataset | |
| ds = load_dataset("SaLTUNIMAS/sarawak-malay-asr") | |
| # Load Iban dataset | |
| ds = load_dataset("SaLTUNIMAS/iban-speech") | |
| CONTACT | |
| Faculty of Computer Science and Information Technology (FCSIT) | |
| Universiti Malaysia Sarawak (UNIMAS) | |
| 94300 Kota Samarahan, Sarawak, Malaysia | |
| https://www.fcsit.unimas.my/ | |