hunterschep commited on
Commit
3f8dd1a
·
verified ·
1 Parent(s): 6da94db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -1
README.md CHANGED
@@ -7,4 +7,85 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # FormosanBank
11
+
12
+ FormosanBank is a large-scale, machine-readable corpus and tooling ecosystem for Taiwan’s Indigenous Formosan languages.
13
+
14
+ We build and share open resources that support:
15
+
16
+ - language documentation
17
+ - linguistic research
18
+ - education
19
+ - language revitalization
20
+ - speech and language technology development
21
+
22
+ This Hugging Face organization is where we publish FormosanBank datasets and related resources for easier access, download, and reuse.
23
+
24
+ ## What you’ll find here
25
+
26
+ - **Datasets** containing text, annotations, and audio-linked resources
27
+ - **Corpus releases** organized for practical use on the Hugging Face Hub
28
+ - **Resources for computational work**, including materials useful for ASR, MT, and other NLP workflows
29
+ - **Documentation and usage guidance** connected to the broader FormosanBank project
30
+
31
+ ## About the project
32
+
33
+ FormosanBank is designed as a centralized repository for data across the extant Formosan languages, with an emphasis on accessibility for researchers, educators, students, and community collaborators.
34
+
35
+ The broader project includes:
36
+
37
+ - digitized texts and transcriptions
38
+ - dictionaries and reference materials
39
+ - audio recordings
40
+ - annotated corpora
41
+ - structured metadata for search, retrieval, and downstream analysis
42
+
43
+ ## Start here
44
+
45
+ - **Documentation / GitBook:** [FormosanBank GitBook](https://ai4commsci.gitbook.io/formosanbank)
46
+ - **GitHub:** [FormosanBank GitHub repository](https://github.com/FormosanBank/FormosanBank)
47
+ - **Hugging Face organization:** [FormosanBank on Hugging Face](https://huggingface.co/FormosanBank)
48
+
49
+ ## Using these resources
50
+
51
+ Some corpora are distributed on Hugging Face in ways that make large audio collections easier to host and retrieve. The FormosanBank documentation includes guidance for downloading data by corpus or language, including workflows for larger audio collections.
52
+
53
+ For technical usage details, see the Hugging Face section of the documentation:
54
+ - [Hugging Face usage guide](https://ai4commsci.gitbook.io/formosanbank/the-bank-architecture/developers/huggingface)
55
+
56
+ ## Licensing and responsible use
57
+
58
+ Licensing may vary by corpus or source material, so please check the license and citation requirements on each dataset and in the documentation before reuse.
59
+
60
+ Important notes from the project documentation include:
61
+
62
+ - some source materials may have their own citation or usage requirements
63
+ - FormosanBank corpora include restrictions on **commercial AI use**
64
+ - FormosanBank annotations and metadata are released under **CC-BY-4.0**
65
+
66
+ Please review the full terms here:
67
+ - [Terms of Use](https://ai4commsci.gitbook.io/formosanbank/additional-resources/terms-of-use)
68
+
69
+ ## Contributing
70
+
71
+ We welcome collaboration with researchers, educators, and community members.
72
+
73
+ If you would like to contribute data, discuss licensing, share corrections, or explore collaboration, please see:
74
+ - [Contributing to FormosanBank](https://ai4commsci.gitbook.io/formosanbank/additional-resources/contributing-to-formosanbank)
75
+
76
+ ## Publications
77
+
78
+ FormosanBank supports research on endangered and Indigenous language technology, including work in machine translation, ASR, OCR, and corpus development.
79
+
80
+ A list of related publications is available here:
81
+ - [Publications](https://ai4commsci.gitbook.io/formosanbank/additional-resources/publications)
82
+
83
+ ## Citation
84
+
85
+ If you use FormosanBank in academic work, please cite:
86
+
87
+ > Mohamed, W., Le Ferrand, É., Sung, L.-M., Prud'hommeaux, E., & Hartshorne, J. K. (2024). *FormosanBank*. Electronic Resource.
88
+
89
+ ## Acknowledgment
90
+
91
+ FormosanBank is made possible through collaboration among researchers, contributors, and community partners working to support the documentation and revitalization of Taiwan’s Indigenous Formosan languages.