royleibov
/

roberta-base-ZipNN-Compressed

@@ -6,8 +6,56 @@ license: mit
 datasets:
 - bookcorpus
 - wikipedia
 ---
 # RoBERTa base model
 Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
@@ -50,7 +98,11 @@ You can use this model directly with a pipeline for masked language modeling:
 ```python
 >>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='roberta-base')
 >>> unmasker("Hello I'm a <mask> model.")
 [{'sequence': "<s>Hello I'm a male model.</s>",
@@ -79,8 +131,12 @@ Here is how to use this model to get the features of a given text in PyTorch:
 ```python
 from transformers import RobertaTokenizer, RobertaModel
-tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
-model = RobertaModel.from_pretrained('roberta-base')
 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='pt')
 output = model(**encoded_input)
@@ -90,8 +146,12 @@ and in TensorFlow:
 ```python
 from transformers import RobertaTokenizer, TFRobertaModel
-tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
-model = TFRobertaModel.from_pretrained('roberta-base')
 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='tf')
 output = model(encoded_input)
@@ -104,7 +164,11 @@ neutral. Therefore, the model can have biased predictions:
 ```python
 >>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='roberta-base')
 >>> unmasker("The man worked as a <mask>.")
 [{'sequence': '<s>The man worked as a mechanic.</s>',
@@ -231,4 +295,4 @@ Glue test results:
 <a href="https://huggingface.co/exbert/?model=roberta-base">
 	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>

 datasets:
 - bookcorpus
 - wikipedia
+base_model:
+- FacebookAI/roberta-base
 ---
+# Disclaimer and Requirements
+This model is a clone of [**FacebookAI/roberta-base**](https://huggingface.co/FacebookAI/roberta-base) compressed using ZipNN. Compressed losslessly to 54% its original size, ZipNN saved ~0.25GB in storage and potentially ~5PB in data transfer **monthly**.
+### Requirement
+In order to use the model, ZipNN is necessary:
+```bash
+pip install zipnn
+```
+### Use This Model
+```python
+# Use a pipeline as a high-level helper
+from transformers import pipeline
+from zipnn import zipnn_hf
+zipnn_hf()
+pipe = pipeline("fill-mask", model="royleibov/roberta-base-ZipNN-Compressed")
+```
+```python
+# Load model directly
+import torch
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+from zipnn import zipnn_hf
+zipnn_hf()
+tokenizer = AutoTokenizer.from_pretrained("royleibov/roberta-base-ZipNN-Compressed")
+model = AutoModelForMaskedLM.from_pretrained("royleibov/roberta-base-ZipNN-Compressed")
+```
+### ZipNN
+ZipNN also allows you to seemlessly save local disk space in your cache after the model is downloaded.
+To compress the cached model, simply run:
+```bash
+python zipnn_compress_path.py safetensors --model royleibov/roberta-base-ZipNN-Compressed --hf_cache
+```
+The model will be decompressed automatically and safely as long as `zipnn_hf()` is added at the top of the file like in the [example above](#use-this-model).
+To decompress manualy, simply run:
+```bash
+python zipnn_decompress_path.py --model royleibov/roberta-base-ZipNN-Compressed --hf_cache
+```
 # RoBERTa base model
 Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
 ```python
 >>> from transformers import pipeline
+>>> from zipnn import zipnn_hf
+>>> zipnn_hf()
+>>> unmasker = pipeline('fill-mask', model='royleibov/roberta-base-ZipNN-Compressed')
 >>> unmasker("Hello I'm a <mask> model.")
 [{'sequence': "<s>Hello I'm a male model.</s>",
 ```python
 from transformers import RobertaTokenizer, RobertaModel
+from zipnn import zipnn_hf
+zipnn_hf()
+tokenizer = RobertaTokenizer.from_pretrained('royleibov/roberta-base-ZipNN-Compressed')
+model = RobertaModel.from_pretrained('royleibov/roberta-base-ZipNN-Compressed')
 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='pt')
 output = model(**encoded_input)
 ```python
 from transformers import RobertaTokenizer, TFRobertaModel
+from zipnn import zipnn_hf
+zipnn_hf()
+tokenizer = RobertaTokenizer.from_pretrained('royleibov/roberta-base-ZipNN-Compressed')
+model = TFRobertaModel.from_pretrained('royleibov/roberta-base-ZipNN-Compressed')
 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='tf')
 output = model(encoded_input)
 ```python
 >>> from transformers import pipeline
+>>> from zipnn import zipnn_hf
+>>> zipnn_hf()
+>>> unmasker = pipeline('fill-mask', model='royleibov/roberta-base-ZipNN-Compressed')
 >>> unmasker("The man worked as a <mask>.")
 [{'sequence': '<s>The man worked as a mechanic.</s>',
 <a href="https://huggingface.co/exbert/?model=roberta-base">
 	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
+</a>