lannelin
/

bert-imdb-1hidden

Text Classification

Model card Files Files and versions

James Bishop commited on Dec 8, 2020

Commit

7225122

·

1 Parent(s): f843c71

model card

Files changed (1) hide show

README.md +63 -0

README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+---
+language:
+- en
+datasets:
+- imdb
+metrics:
+- accuracy
+---
+# bert-imdb-1hidden
+## Model description
+A `bert-base-uncased` model was restricted to 1 hidden layer and
+fine-tuned for sequence classification on the
+imdb dataset loaded using the `datasets` library.
+## Intended uses & limitations
+#### How to use
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+pretrained = "lannelin/bert-imdb-1hidden"
+tokenizer = AutoTokenizer.from_pretrained(pretrained)
+model = AutoModelForSequenceClassification.from_pretrained(pretrained)
+LABELS = ["negative", "positive"]
+def get_sentiment(text: str):
+    inputs = tokenizer.encode_plus(text, return_tensors='pt')
+    output = model(**inputs)[0].squeeze()
+    return LABELS[(output.argmax())]
+print(get_sentiment("What a terrible film!"))
+```
+#### Limitations and bias
+No special consideration given to limitations and bias.
+Any bias held by the imdb dataset may be reflected in the model's output.
+## Training data
+Initialised with [bert-base-uncased](https://huggingface.co/bert-base-uncased)
+Fine tuned on [imdb](https://huggingface.co/datasets/imdb)
+## Training procedure
+ The model was fine-tuned for 1 epoch with a batch size of 64,
+ a learning rate of 5e-5, and a maximum sequence length of 512.
+## Eval results
+Accuracy on imdb test set: 0.87132