ryeyoo commited on
Commit
3df0af0
·
verified ·
1 Parent(s): 7648118

Update encoder model card

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - sentiment-analysis
6
+ - text-classification
7
+ - encoder
8
+ library_name: sentimentizer
9
+ task: text-classification
10
+ ---
11
+ # Sentimentizer ENCODER Sentiment Model
12
+ ## Description
13
+
14
+ A Transformer Encoder for sentiment classification built on pre-trained GloVe embeddings. The model uses multi-head self-attention with positional encodings and a classification token (CLS) to produce a sentiment score.
15
+
16
+ ## Training Data
17
+
18
+ Trained on the [Yelp Open Dataset](https://www.yelp.com/dataset) reviews, with GloVe Wiki-Gigaword-100 pre-trained embeddings. Reviews are tokenized with a custom dictionary (20k vocab, min frequency 3) and padded/truncated to 200 tokens.
19
+
20
+ ## Usage
21
+
22
+ ```python
23
+ from sentimentizer.hf import download_weights
24
+ from sentimentizer.config import DriverConfig, weights_path_for
25
+
26
+ # Download weights + dictionary from Hugging Face Hub
27
+ weights_path = weights_path_for("encoder")
28
+ download_weights(
29
+ "encoder",
30
+ weights_path,
31
+ dict_path=DriverConfig.files.dictionary_file_path,
32
+ )
33
+
34
+ # Load and run inference
35
+ from sentimentizer.models.encoder import get_trained_model
36
+ from sentimentizer.tokenizer import get_trained_tokenizer
37
+
38
+ model = get_trained_model(device="cpu")
39
+ tokenizer = get_trained_tokenizer()
40
+
41
+ import numpy as np
42
+ token_ids = tokenizer.tokenize_text("amazing food great service")
43
+ score = model.predict(token_ids)
44
+ print(f'Sentiment score: {score.item():.4f}') # >0.5 = positive, <0.5 = negative
45
+ ```
46
+
47
+ ## Files
48
+
49
+ - `encoder_weights.pth` — Model state dictionary
50
+ - `yelp.dictionary` — Gensim dictionary for tokenization