| # RobIt | |
| **RobIt** is a RoBERTa-base model for Italian. It has been trained from scratch on the Italian portion of the OSCAR dataset using [Flax](https://github.com/google/flax), including training scripts. | |
| This is part of the | |
| [Flax/Jax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104), organised by [HuggingFace](https://huggingface.co/) and TPU usage sponsored by Google. | |
| ## Team members | |
| - Prateek Agrawal (prateekagrawal) | |
| - Tanay Mehta (yotanay) | |
| - Shreya Gupta (Sheyz-max) | |
| - Ruchi Bhatia (ruchi798) | |
| ## Dataset : | |
| [OSCAR](https://huggingface.co/datasets/oscar) | |
| - config : **unshuffled_deduplicated_it** | |
| - Size of downloaded dataset files: **26637.62 MB** | |
| - Size of the generated dataset: **70661.48 MB** | |
| - Total amount of disk used: **97299.10 MB** | |
| ## Useful links | |
| - [Community Week timeline](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104#summary-timeline-calendar-6) | |
| - [Community Week README](https://github.com/huggingface/transformers/blob/master/examples/research_projects/jax-projects/README.md) | |
| - [Community Week thread](https://discuss.huggingface.co/t/robit-pretrain-roberta-base-from-scratch-in-italian/7564) | |
| - [Community Week channel](https://discord.gg/NTyQNUNs) | |
| - [Masked Language Modelling example scripts](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling) | |
| - [Model Repository](https://huggingface.co/flax-community/robit-roberta-base-it/) | |