hunterschep commited on
Commit
07eb7de
·
verified ·
1 Parent(s): c7c4807

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -26
README.md CHANGED
@@ -1,35 +1,36 @@
1
  ---
2
- title: FormosanBank
3
- emoji: 🌖
4
- colorFrom: pink
5
- colorTo: green
6
- sdk: static
7
- pinned: false
8
- license: other
9
  ---
10
 
11
- # FormosanBank
12
 
13
- FormosanBank is a centralized repository of corpora and tooling for the 16 extant Formosan (Taiwan Indigenous) languages—designed for research, education, and community use. The documentation covers data structure, access methods, and project background. :contentReference[oaicite:0]{index=0}
 
14
 
15
- - **Scale (at a glance):** >8M tokens and ~730 hours of audio across the languages (see GitBook “Welcome” for breakdowns). :contentReference[oaicite:1]{index=1}
16
- - **What’s inside:** texts & transcriptions, dictionaries, reference grammars, and curated corpora entries (e.g., NTU Paiwan ASR, ePark, Wikipedias). :contentReference[oaicite:2]{index=2}
17
- - **Data format:** a standardized XML schema (inspired by the Pangloss Collection) to ensure consistent metadata and downstream processing. :contentReference[oaicite:3]{index=3}
 
 
 
 
18
 
19
- ## Documentation
 
20
 
21
- - **GitBook (primary docs):** https://ai4commsci.gitbook.io/formosanbank
22
- Start with *Background Why Formosan*, *Developers*, *Corpora*, and the *XML Format* pages. :contentReference[oaicite:4]{index=4}
23
 
24
- ## Repository
25
-
26
- - **GitHub (code & corpora structure):** https://github.com/FormosanBank/FormosanBank
27
- Includes quality-control utilities (XML validation, orthography comparison, cleaning scripts) and corpus organization. :contentReference[oaicite:5]{index=5}
28
-
29
- ## Terms of Use (Summary)
30
-
31
- You’re free to use and redistribute FormosanBank materials **with restrictions**, including **No Commercial AI Use** (see full Terms for details). If in doubt, consult the GitBook Terms page. :contentReference[oaicite:6]{index=6}
32
-
33
- ## Cite / Related Work
34
 
35
- If you use FormosanBank in publications or demos, please reference the project documentation and any linked corpora. Recent research and benchmarks drawing on FormosanBank’s preparation pipelines include work on low-resource evaluation for Formosan languages. :contentReference[oaicite:7]{index=7}
 
 
 
 
 
 
1
  ---
2
+ title: README
3
+ emoji: 🌖
4
+ colorFrom: pink
5
+ colorTo: green
6
+ sdk: static
7
+ pinned: false
 
8
  ---
9
 
10
+ ## FormosanBank
11
 
12
+ **What is FormosanBank?**
13
+ FormosanBank is an open-source repository of corpora and quality-control tools supporting the documentation, processing, and machine-learning use of Taiwan’s indigenous Formosan languages. :contentReference[oaicite:1]{index=1}
14
 
15
+ **Key Features:**
16
+ - A large collection of corpora across multiple Formosan languages (e.g., Amis, Paiwan, Atayal) with scripts for cleaning, orthography extraction, validation etc. :contentReference[oaicite:2]{index=2}
17
+ - Quality control modules: punctuation checks, non-ASCII filtering, XML-template verification, orthography extraction. :contentReference[oaicite:3]{index=3}
18
+ - Designed to support downstream NLP tasks (translation, ASR, summarization) for low-resource languages. :contentReference[oaicite:4]{index=4}
19
+ - License: *[you should insert your specific license here]*
20
+ - Maintained in a GitHub repository: [https://github.com/FormosanBank/FormosanBank](https://github.com/FormosanBank/FormosanBank) :contentReference[oaicite:5]{index=5}
21
+ - Linked documentation: [https://ai4commsci.gitbook.io/formosanbank](https://ai4commsci.gitbook.io/formosanbank)
22
 
23
+ **Usage / SDK:**
24
+ Since the card says `sdk: static` this suggests you are using static hosting of docs or a simple web UI. You can embed links to the repo, the docs, usage instructions etc.
25
 
26
+ **Short description:**
27
+ Building open-source infrastructure and corpora for Taiwan’s indigenous Formosan languages, enabling machine translation, ASR and summarization efforts in extremely low-resource settings.
28
 
29
+ ---
 
 
 
 
 
 
 
 
 
30
 
31
+ ### Getting started
32
+ 1. Clone the repository:
33
+ ```bash
34
+ git clone https://github.com/FormosanBank/FormosanBank
35
+ cd FormosanBank
36
+ pip install -r requirements.txt