nhull
/

distilbert-sentiment-model

@@ -35,9 +35,11 @@ This repository contains a fine-tuned **DistilBERT** model trained for sentiment
 - **Tokenizer**: `distilbert-base-uncased`
 - **Framework**: PyTorch + Hugging Face Transformers
----
-## Training and Validation
 ### Dataset
@@ -49,34 +51,7 @@ The dataset used for training, validation, and testing is [nhull/tripadvisor-spl
 All splits are balanced across five sentiment labels.
-### Validation Performance
-| Metric     | Value  |
-|------------|--------|
-| Accuracy   | 0.6294 |
-| Precision  | 0.6313 |
-| Recall     | 0.6294 |
-| F1-Score   | 0.6297 |
-#### Classification Report (Validation Set)
-| Label | Precision | Recall | F1-Score | Support |
-|-------|-----------|--------|----------|---------|
-| 1     | 0.7612    | 0.6875 | 0.7225   | 320     |
-| 2     | 0.5255    | 0.5469 | 0.5360   | 320     |
-| 3     | 0.5859    | 0.5969 | 0.5913   | 320     |
-| 4     | 0.5696    | 0.5500 | 0.5596   | 320     |
-| 5     | 0.7143    | 0.7656 | 0.7391   | 320     |
-### Confusion Matrix (Validation Set)
-| Predicted\Actual | 1   | 2   | 3   | 4   | 5   |
-|-------------------|-----|-----|-----|-----|-----|
-| **1**            | 220 | 87  | 10  | 1   | 2   |
-| **2**            | 63  | 175 | 75  | 6   | 1   |
-| **3**            | 4   | 61  | 191 | 60  | 4   |
-| **4**            | 1   | 6   | 46  | 176 | 91  |
-| **5**            | 1   | 4   | 4   | 66  | 245 |
 ### Test Performance
@@ -111,43 +86,6 @@ Model predicts too high on average by `0.3934`.
 ---
-## How to Use
-### Load the Model
-```python
-from transformers import pipeline
-model_name = "models/distilbert/best_trained_model"
-classifier = pipeline("text-classification", model=model_name, tokenizer=model_name)
-text = "The hotel was great, but the staff was rude."
-result = classifier(text)
-print(result)  # [{'label': '3', 'score': 0.82}]
-```
-### Evaluate Custom Text
-To evaluate custom text or datasets, load the tokenizer and model as follows:
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-import torch
-# Load model and tokenizer
-model = AutoModelForSequenceClassification.from_pretrained("models/distilbert/best_trained_model")
-tokenizer = AutoTokenizer.from_pretrained("models/distilbert/best_trained_model")
-# Input text
-text = "The room was clean and spacious, but the food was disappointing."
-inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
-# Model prediction
-outputs = model(**inputs)
-predicted_label = torch.argmax(outputs.logits) + 1  # Convert back to 1-based indexing
-print(f"Predicted Sentiment: {predicted_label}")
-```
----
 ## Files Included
 - **`validation_results_distilbert.csv`**: Contains correctly classified reviews with their real and predicted labels.

 - **Tokenizer**: `distilbert-base-uncased`
 - **Framework**: PyTorch + Hugging Face Transformers
+## Intended Use
+This model is designed to classify hotel reviews based on their sentiment. It assigns a star rating between 1 and 5 to a review, indicating the sentiment expressed in the review.
+---
 ### Dataset
 All splits are balanced across five sentiment labels.
+---
 ### Test Performance
 ---
 ## Files Included
 - **`validation_results_distilbert.csv`**: Contains correctly classified reviews with their real and predicted labels.