File size: 1,608 Bytes
8a2bbcb
 
 
 
 
 
 
 
 
e3910b5
f0061bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
title: README
emoji: 🐨
colorFrom: blue
colorTo: blue
sdk: static
pinned: false
---

SaLT UNIMAS - SARAWAK LANGUAGE TECHNOLOGY 
=====================================
Faculty of Computer Science and Information Technology (FCSIT)
Universiti Malaysia Sarawak (UNIMAS)

ABOUT
This organization hosts speech and language datasets developed by the
Sarawak Language Technology (SaLT) research group at UNIMAS FCSIT.
Our focus is on low-resource languages and dialects spoken in Sarawak,
Malaysia — particularly Sarawak Malay and Iban.

DATASETS

1. sarawak-malay-asr
   Language    : Sarawak Malay (ms)
   Utterances  : 1,164
   Duration    : ~1.9 hours
   Speakers    : 42
   Task        : Automatic Speech Recognition
   Link        : https://huggingface.co/datasets/SaLTUNIMAS/sarawak-malay-asr

2. iban-speech
   Language    : Iban (iba)
   Utterances  : ~2,977
   Duration    : ~8 hours
   Task        : Automatic Speech Recognition
   Link        : https://huggingface.co/datasets/SaLTUNIMAS/iban-speech
   Source      : https://github.com/sarahjuan/iban

LANGUAGES COVERED
  - Sarawak Malay  (low-resource dialect, Sarawak, Malaysia)
  - Iban           (low-resource language, Sarawak, Malaysia)

QUICK START
  pip install datasets

  # Load Sarawak Malay dataset
  from datasets import load_dataset
  ds = load_dataset("SaLTUNIMAS/sarawak-malay-asr")

  # Load Iban dataset
  ds = load_dataset("SaLTUNIMAS/iban-speech")

CONTACT
Faculty of Computer Science and Information Technology (FCSIT)
Universiti Malaysia Sarawak (UNIMAS)
94300 Kota Samarahan, Sarawak, Malaysia
https://www.fcsit.unimas.my/