prateekagrawal's picture
Saving weights and logs of step 8
b1b3841
# RobIt
**RobIt** is a RoBERTa-base model for Italian. It has been trained from scratch on the Italian portion of the OSCAR dataset using [Flax](https://github.com/google/flax), including training scripts.
This is part of the
[Flax/Jax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104), organised by [HuggingFace](https://huggingface.co/) and TPU usage sponsored by Google.
## Team members
- Prateek Agrawal (prateekagrawal)
- Tanay Mehta (yotanay)
- Shreya Gupta (Sheyz-max)
- Ruchi Bhatia (ruchi798)
## Dataset :
[OSCAR](https://huggingface.co/datasets/oscar)
- config : **unshuffled_deduplicated_it**
- Size of downloaded dataset files: **26637.62 MB**
- Size of the generated dataset: **70661.48 MB**
- Total amount of disk used: **97299.10 MB**
## Useful links
- [Community Week timeline](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104#summary-timeline-calendar-6)
- [Community Week README](https://github.com/huggingface/transformers/blob/master/examples/research_projects/jax-projects/README.md)
- [Community Week thread](https://discuss.huggingface.co/t/robit-pretrain-roberta-base-from-scratch-in-italian/7564)
- [Community Week channel](https://discord.gg/NTyQNUNs)
- [Masked Language Modelling example scripts](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling)
- [Model Repository](https://huggingface.co/flax-community/robit-roberta-base-it/)