File size: 2,658 Bytes
606820f
7952f86
fff6215
 
 
 
 
606820f
 
 
 
b656299
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
title: 🐦 TwittBERTO
emoji: πŸš—
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.42.2
app_file: app.py
pinned: false
---


This project demonstrates a sentiment analysis pipeline built with **DistilBERT**, a lightweight transformer model developed by Hugging Face. The model was fine-tuned on a dataset of 16,000 tweets to classify sentiment into categories such as **Positive**, **Negative**, and **Neutral**. The final model achieved an impressive **90% accuracy** on the validation set.

---

## πŸš€ Features

* Utilizes **DistilBERT** for high-performance NLP with lower resource consumption.
* Cleaned and preprocessed Twitter data (16K rows).
* Fine-tuned with PyTorch and Hugging Face Transformers.
* Achieved **90%+ accuracy** on sentiment classification.
* Includes training, validation, and evaluation pipelines.

---

## πŸ“ Dataset

* 16,000 manually labeled tweets with three sentiment classes:

  * `Positive`
  * `Negative`
  * `Neutral`
* Dataset was preprocessed to remove mentions, hashtags, links, and special characters.

---

## 🧠 Model

* **Base Model**: `distilbert-base-uncased`
* **Fine-tuning**: Trained for several epochs using a cross-entropy loss function and AdamW optimizer.
* **Tokenizer**: Hugging Face `DistilBertTokenizerFast`
* **Training Framework**: PyTorch + Hugging Face `Trainer` API

---

## πŸ“Š Performance

| Metric    | Score |
| --------- | ----- |
| Accuracy  | 90%   |
| Precision | High  |
| Recall    | High  |
| F1-score  | High  |

> Note: Actual precision, recall, and F1-score values can be added if available.

---

## πŸ“¦ Dependencies

```bash
transformers==4.x.x
torch==1.x
scikit-learn
pandas
numpy
matplotlib
```

Install with:

```bash
pip install -r requirements.txt
```

---

## πŸ› οΈ How to Run

1. Clone the repository:

   ```bash
   git clone https://github.com/yourusername/twitter-sentiment-distilbert.git
   cd twitter-sentiment-distilbert
   ```

2. Install dependencies:

   ```bash
   pip install -r requirements.txt
   ```

3. Train the model:

   ```bash
   python train.py
   ```

4. Evaluate the model:

   ```bash
   python evaluate.py
   ```

5. Run prediction on new tweets:

   ```bash
   python predict.py --text "I love this app!"
   ```

---

## πŸ“ˆ Example Output

```bash
Input: "I love this app!"
Predicted Sentiment: Positive
```

---

## πŸ“š Future Improvements

* Integrate with a live Twitter API for real-time sentiment tracking.
* Add a web dashboard using Streamlit or Flask.
* Extend to multilingual support using `xlm-roberta`.

---

## πŸ“„ License

This project is open-source and available under the [MIT License](LICENSE).

---