nhull commited on
Commit
3af7e89
·
verified ·
1 Parent(s): 48eba65

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -67
README.md CHANGED
@@ -35,9 +35,11 @@ This repository contains a fine-tuned **DistilBERT** model trained for sentiment
35
  - **Tokenizer**: `distilbert-base-uncased`
36
  - **Framework**: PyTorch + Hugging Face Transformers
37
 
38
- ---
 
 
39
 
40
- ## Training and Validation
41
 
42
  ### Dataset
43
 
@@ -49,34 +51,7 @@ The dataset used for training, validation, and testing is [nhull/tripadvisor-spl
49
 
50
  All splits are balanced across five sentiment labels.
51
 
52
- ### Validation Performance
53
-
54
- | Metric | Value |
55
- |------------|--------|
56
- | Accuracy | 0.6294 |
57
- | Precision | 0.6313 |
58
- | Recall | 0.6294 |
59
- | F1-Score | 0.6297 |
60
-
61
- #### Classification Report (Validation Set)
62
-
63
- | Label | Precision | Recall | F1-Score | Support |
64
- |-------|-----------|--------|----------|---------|
65
- | 1 | 0.7612 | 0.6875 | 0.7225 | 320 |
66
- | 2 | 0.5255 | 0.5469 | 0.5360 | 320 |
67
- | 3 | 0.5859 | 0.5969 | 0.5913 | 320 |
68
- | 4 | 0.5696 | 0.5500 | 0.5596 | 320 |
69
- | 5 | 0.7143 | 0.7656 | 0.7391 | 320 |
70
-
71
- ### Confusion Matrix (Validation Set)
72
-
73
- | Predicted\Actual | 1 | 2 | 3 | 4 | 5 |
74
- |-------------------|-----|-----|-----|-----|-----|
75
- | **1** | 220 | 87 | 10 | 1 | 2 |
76
- | **2** | 63 | 175 | 75 | 6 | 1 |
77
- | **3** | 4 | 61 | 191 | 60 | 4 |
78
- | **4** | 1 | 6 | 46 | 176 | 91 |
79
- | **5** | 1 | 4 | 4 | 66 | 245 |
80
 
81
  ### Test Performance
82
 
@@ -111,43 +86,6 @@ Model predicts too high on average by `0.3934`.
111
 
112
  ---
113
 
114
- ## How to Use
115
-
116
- ### Load the Model
117
-
118
- ```python
119
- from transformers import pipeline
120
-
121
- model_name = "models/distilbert/best_trained_model"
122
- classifier = pipeline("text-classification", model=model_name, tokenizer=model_name)
123
-
124
- text = "The hotel was great, but the staff was rude."
125
- result = classifier(text)
126
- print(result) # [{'label': '3', 'score': 0.82}]
127
- ```
128
-
129
- ### Evaluate Custom Text
130
- To evaluate custom text or datasets, load the tokenizer and model as follows:
131
- ```python
132
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
133
- import torch
134
-
135
- # Load model and tokenizer
136
- model = AutoModelForSequenceClassification.from_pretrained("models/distilbert/best_trained_model")
137
- tokenizer = AutoTokenizer.from_pretrained("models/distilbert/best_trained_model")
138
-
139
- # Input text
140
- text = "The room was clean and spacious, but the food was disappointing."
141
- inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
142
-
143
- # Model prediction
144
- outputs = model(**inputs)
145
- predicted_label = torch.argmax(outputs.logits) + 1 # Convert back to 1-based indexing
146
- print(f"Predicted Sentiment: {predicted_label}")
147
- ```
148
-
149
- ---
150
-
151
  ## Files Included
152
 
153
  - **`validation_results_distilbert.csv`**: Contains correctly classified reviews with their real and predicted labels.
 
35
  - **Tokenizer**: `distilbert-base-uncased`
36
  - **Framework**: PyTorch + Hugging Face Transformers
37
 
38
+ ## Intended Use
39
+
40
+ This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
41
 
42
+ ---
43
 
44
  ### Dataset
45
 
 
51
 
52
  All splits are balanced across five sentiment labels.
53
 
54
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  ### Test Performance
57
 
 
86
 
87
  ---
88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  ## Files Included
90
 
91
  - **`validation_results_distilbert.csv`**: Contains correctly classified reviews with their real and predicted labels.