File size: 1,629 Bytes
d40d209
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
language: en
license: mit
tags:
  - sentiment-analysis
  - text-classification
  - decoder
library_name: sentimentizer
task: text-classification
---
# Sentimentizer DECODER Sentiment Model
## Description

A Transformer Encoder-Decoder for sentiment classification built on pre-trained GloVe embeddings. The encoder processes the input sequence, and the decoder attends to the encoder outputs to produce a sentiment prediction.

## Training Data

Trained on the [Yelp Open Dataset](https://www.yelp.com/dataset) reviews, with GloVe Wiki-Gigaword-100 pre-trained embeddings. Reviews are tokenized with a custom dictionary (20k vocab, min frequency 3) and padded/truncated to 200 tokens.

## Usage

```python
from sentimentizer.hf import download_weights
from sentimentizer.config import DriverConfig, weights_path_for

# Download weights + dictionary from Hugging Face Hub
weights_path = weights_path_for("decoder")
download_weights(
    "decoder",
    weights_path,
    repo_id="ryeyoo/sentimentizer-decoder",
    dict_path=DriverConfig.files.dictionary_file_path,
)

# Load and run inference
from sentimentizer.models.decoder import get_trained_model
from sentimentizer.tokenizer import get_trained_tokenizer

model = get_trained_model(device="cpu")
tokenizer = get_trained_tokenizer()

probs = model.predict_text('amazing food great service')
for label, prob in sorted(probs.items(), key=lambda x: -x[1]):
    print(f'{label}: {prob:.4f}')
# e.g. positive: 0.8300, neutral: 0.1200, negative: 0.0500
```

## Files

- `decoder_weights.pth` — Model state dictionary
- `yelp.dictionary` — Gensim dictionary for tokenization