README / README.md
shikinwahed's picture
Update README.md
e3910b5 verified
---
title: README
emoji: 🐨
colorFrom: blue
colorTo: blue
sdk: static
pinned: false
---
SaLT UNIMAS - SARAWAK LANGUAGE TECHNOLOGY
=====================================
Faculty of Computer Science and Information Technology (FCSIT)
Universiti Malaysia Sarawak (UNIMAS)
ABOUT
This organization hosts speech and language datasets developed by the
Sarawak Language Technology (SaLT) research group at UNIMAS FCSIT.
Our focus is on low-resource languages and dialects spoken in Sarawak,
Malaysia — particularly Sarawak Malay and Iban.
DATASETS
1. sarawak-malay-asr
Language : Sarawak Malay (ms)
Utterances : 1,164
Duration : ~1.9 hours
Speakers : 42
Task : Automatic Speech Recognition
Link : https://huggingface.co/datasets/SaLTUNIMAS/sarawak-malay-asr
2. iban-speech
Language : Iban (iba)
Utterances : ~2,977
Duration : ~8 hours
Task : Automatic Speech Recognition
Link : https://huggingface.co/datasets/SaLTUNIMAS/iban-speech
Source : https://github.com/sarahjuan/iban
LANGUAGES COVERED
- Sarawak Malay (low-resource dialect, Sarawak, Malaysia)
- Iban (low-resource language, Sarawak, Malaysia)
QUICK START
pip install datasets
# Load Sarawak Malay dataset
from datasets import load_dataset
ds = load_dataset("SaLTUNIMAS/sarawak-malay-asr")
# Load Iban dataset
ds = load_dataset("SaLTUNIMAS/iban-speech")
CONTACT
Faculty of Computer Science and Information Technology (FCSIT)
Universiti Malaysia Sarawak (UNIMAS)
94300 Kota Samarahan, Sarawak, Malaysia
https://www.fcsit.unimas.my/