Mais Alheraki commited on
Commit Β·
e22146c
1
Parent(s): bf20bad
Update README.md
Browse files
README.md
CHANGED
|
@@ -22,16 +22,17 @@ pinned: false
|
|
| 22 |
CALM: Collaborative Arabic Language Model
|
| 23 |
</p>
|
| 24 |
<p class="mb-2">
|
| 25 |
-
The CALM project is joint effort lead by <u><a target="_blank"
|
| 26 |
<u><a target="_blank" href="https://yandex.com/">Yandex</a> and <a href="https://huggingface.co/">HuggingFace</a></u> to train an Arabic language model with
|
| 27 |
volunteers from around the globe. The project is an adaptation of the framework proposed at the NeurIPS 2021 demonstration:
|
| 28 |
-
<u><a target="_blank"
|
| 29 |
</p>
|
| 30 |
<p class="mb-2">
|
| 31 |
One of the main obstacles facing many researchers in the Arabic NLP community is the lack of computing resources that are needed for training large models. Models with
|
| 32 |
-
leading performane on Arabic NLP tasks, such as <a target="_blank" href="https://github.com/aub-mind/arabert">AraBERT</a>,
|
| 33 |
-
<a href="https://github.com/CAMeL-Lab/CAMeLBERT" target="_blank" >CamelBERT</a>,
|
| 34 |
-
<a href="https://huggingface.co/aubmindlab/araelectra-base-generator" target="_blank" >AraELECTRA</a>
|
|
|
|
| 35 |
took days to train on TPUs. In the spirit of democratization of AI and community enabling, a core value at NCAI, CALM aims to demonstrate the effectiveness
|
| 36 |
of collaborative training and form a community of volunteers for ANLP researchers with basic level cloud GPUs who wish to train their own models collaboratively.
|
| 37 |
</p>
|
|
@@ -40,7 +41,7 @@ pinned: false
|
|
| 40 |
Each volunteer GPU trains the model locally at its own pace on a portion of the dataset while another portion is being streamed in the background to reduces local
|
| 41 |
memory consumption. Computing the gradients and aggregating them is performed in a distributed manner, based on the computing abilities of each participating
|
| 42 |
volunteer. Details of the distributed training process are further described in the paper
|
| 43 |
-
<a target="_blank"
|
| 44 |
</p>
|
| 45 |
|
| 46 |
<p class="mb-2" style="font-size:20px;font-weight:bold">
|
|
@@ -52,15 +53,15 @@ pinned: false
|
|
| 52 |
</p>
|
| 53 |
|
| 54 |
<ul class="mb-2">
|
| 55 |
-
<li>π Create an account on <a href="https://huggingface.co">Huggingface</a>.</li>
|
| 56 |
-
<li>π Join the <a href="https://huggingface.co/CALM">NCAI-CALM Organization</a> on Huggingface through the invitation link shared with you by email.</li>
|
| 57 |
<li>π Get your Access Token, it's later required in the notebook.
|
| 58 |
</li>
|
| 59 |
</ul>
|
| 60 |
|
| 61 |
<p class="h2 mb-2" style="font-size:18px;font-weight:bold">How to get my Huggingface Access Token</p>
|
| 62 |
<ul class="mb-2">
|
| 63 |
-
<li>π Go to your <a href="https://huggingface.co">HF account</a>.</li>
|
| 64 |
<li>π Go to Settings β Access Tokens.</li>
|
| 65 |
<li>π Generate a new Access Token and enter any name for "what's this token for".</li>
|
| 66 |
<li>π Select <code>read</code> role.</li>
|
|
|
|
| 22 |
CALM: Collaborative Arabic Language Model
|
| 23 |
</p>
|
| 24 |
<p class="mb-2">
|
| 25 |
+
The CALM project is joint effort lead by <u><a target="_blank" href="https://sdaia.gov.sa/ncai/?Lang=en">NCAI</a></u> in collaboration with
|
| 26 |
<u><a target="_blank" href="https://yandex.com/">Yandex</a> and <a href="https://huggingface.co/">HuggingFace</a></u> to train an Arabic language model with
|
| 27 |
volunteers from around the globe. The project is an adaptation of the framework proposed at the NeurIPS 2021 demonstration:
|
| 28 |
+
<u><a target="_blank" href="https://huggingface.co/training-transformers-together">Training Transformers Together</a></u>.
|
| 29 |
</p>
|
| 30 |
<p class="mb-2">
|
| 31 |
One of the main obstacles facing many researchers in the Arabic NLP community is the lack of computing resources that are needed for training large models. Models with
|
| 32 |
+
leading performane on Arabic NLP tasks, such as <u><a target="_blank" href="https://github.com/aub-mind/arabert">AraBERT</a></u>,
|
| 33 |
+
<u><a href="https://github.com/CAMeL-Lab/CAMeLBERT" target="_blank" >CamelBERT</a></u>,
|
| 34 |
+
<u><a href="https://huggingface.co/aubmindlab/araelectra-base-generator" target="_blank" >AraELECTRA</a></u>, and
|
| 35 |
+
<u><a href="https://huggingface.co/qarib">QARiB</a></u>,
|
| 36 |
took days to train on TPUs. In the spirit of democratization of AI and community enabling, a core value at NCAI, CALM aims to demonstrate the effectiveness
|
| 37 |
of collaborative training and form a community of volunteers for ANLP researchers with basic level cloud GPUs who wish to train their own models collaboratively.
|
| 38 |
</p>
|
|
|
|
| 41 |
Each volunteer GPU trains the model locally at its own pace on a portion of the dataset while another portion is being streamed in the background to reduces local
|
| 42 |
memory consumption. Computing the gradients and aggregating them is performed in a distributed manner, based on the computing abilities of each participating
|
| 43 |
volunteer. Details of the distributed training process are further described in the paper
|
| 44 |
+
<u><a target="_blank" href="https://papers.nips.cc/paper/2021/hash/41a60377ba920919939d83326ebee5a1-Abstract.html">Deep Learning in Open Collaborations</a></u>.
|
| 45 |
</p>
|
| 46 |
|
| 47 |
<p class="mb-2" style="font-size:20px;font-weight:bold">
|
|
|
|
| 53 |
</p>
|
| 54 |
|
| 55 |
<ul class="mb-2">
|
| 56 |
+
<li>π Create an account on <u><a target="_blank" href="https://huggingface.co">Huggingface</a></u>.</li>
|
| 57 |
+
<li>π Join the <u><a target="_blank" href="https://huggingface.co/CALM">NCAI-CALM Organization</a></u> on Huggingface through the invitation link shared with you by email.</li>
|
| 58 |
<li>π Get your Access Token, it's later required in the notebook.
|
| 59 |
</li>
|
| 60 |
</ul>
|
| 61 |
|
| 62 |
<p class="h2 mb-2" style="font-size:18px;font-weight:bold">How to get my Huggingface Access Token</p>
|
| 63 |
<ul class="mb-2">
|
| 64 |
+
<li>π Go to your <u><a href="https://huggingface.co">HF account</a></u>.</li>
|
| 65 |
<li>π Go to Settings β Access Tokens.</li>
|
| 66 |
<li>π Generate a new Access Token and enter any name for "what's this token for".</li>
|
| 67 |
<li>π Select <code>read</code> role.</li>
|