| --- |
| language: en |
| license: mit |
| tags: |
| - sentiment-analysis |
| - text-classification |
| - decoder |
| library_name: sentimentizer |
| task: text-classification |
| --- |
| # Sentimentizer DECODER Sentiment Model |
| ## Description |
|
|
| A Transformer Encoder-Decoder for sentiment classification built on pre-trained GloVe embeddings. The encoder processes the input sequence, and the decoder attends to the encoder outputs to produce a sentiment prediction. |
|
|
| ## Training Data |
|
|
| Trained on the [Yelp Open Dataset](https://www.yelp.com/dataset) reviews, with GloVe Wiki-Gigaword-100 pre-trained embeddings. Reviews are tokenized with a custom dictionary (20k vocab, min frequency 3) and padded/truncated to 200 tokens. |
|
|
| ## Usage |
|
|
| ```python |
| from sentimentizer.hf import download_weights |
| from sentimentizer.config import DriverConfig, weights_path_for |
| |
| # Download weights + dictionary from Hugging Face Hub |
| weights_path = weights_path_for("decoder") |
| download_weights( |
| "decoder", |
| weights_path, |
| repo_id="ryeyoo/sentimentizer-decoder", |
| dict_path=DriverConfig.files.dictionary_file_path, |
| ) |
| |
| # Load and run inference |
| from sentimentizer.models.decoder import get_trained_model |
| from sentimentizer.tokenizer import get_trained_tokenizer |
| |
| model = get_trained_model(device="cpu") |
| tokenizer = get_trained_tokenizer() |
| |
| probs = model.predict_text('amazing food great service') |
| for label, prob in sorted(probs.items(), key=lambda x: -x[1]): |
| print(f'{label}: {prob:.4f}') |
| # e.g. positive: 0.8300, neutral: 0.1200, negative: 0.0500 |
| ``` |
|
|
| ## Files |
|
|
| - `decoder_weights.pth` — Model state dictionary |
| - `yelp.dictionary` — Gensim dictionary for tokenization |
|
|