CesarWKR1 commited on
Commit
fc6d0bd
·
verified ·
1 Parent(s): b8d3d37

Add GPU suggestions

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -2,6 +2,62 @@
2
  license: mit
3
  ---
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
6
  ## 🚀 Project Overview
7
 
@@ -1475,6 +1531,8 @@ The model repository includes:
1475
 
1476
  making the model fully compatible with Hugging Face Transformers.
1477
 
 
 
1478
 
1479
  ## 🎥 YouTube API Demo
1480
  A full video demonstration of the Sentiment Analysis API is also available on YouTube.
 
2
  license: mit
3
  ---
4
 
5
+ ## 📌 Project Philosophy
6
+
7
+ This project intentionally preserves a controlled amount of real-world noise inside the final training dataset instead of aggressively sanitizing every sample.
8
+
9
+ The objective was to train the sentiment classifier under realistic social-media conditions, where user-generated content naturally includes:
10
+
11
+ - repetitive text
12
+ - malformed sentences
13
+ - slang and informal grammar
14
+ - emotionally chaotic writing
15
+ - duplicated phrases
16
+ - inconsistent punctuation
17
+ - low-quality Reddit comments
18
+ - partially incoherent text
19
+ - noisy conversational patterns
20
+
21
+ Examples of preserved noise include:
22
+
23
+ - repeated phrases such as: "Avoid being judgmental."
24
+ - incomplete or poorly structured sentences
25
+ - emotionally disorganized long-form Reddit posts
26
+ - imperfect GPT-generated synthetic samples
27
+ - informal internet writing styles
28
+
29
+ Rather than building a perfectly clean academic benchmark, the pipeline focuses on creating a model capable of handling imperfect real-world inputs commonly found on social media platforms.
30
+
31
+ The preprocessing pipeline still performs:
32
+
33
+ - invalid text filtering
34
+ - semantic validation
35
+ - augmentation quality control
36
+ - synthetic sample filtering
37
+
38
+ but intentionally avoids over-cleaning the dataset in order to preserve natural language variability.
39
+
40
+ This strategy improves:
41
+
42
+ - robustness to noisy inputs
43
+ - real-world generalization
44
+ - inference stability
45
+ - tolerance to imperfect user text
46
+ - production-oriented behavior
47
+
48
+ The final model was designed to operate under realistic NLP conditions rather than idealized datasets.
49
+
50
+ ⚠️ Note:
51
+ Even though the model achieved strong performance under noisy conditions, cleaner datasets and more aggressive manual curation could likely produce even higher evaluation metrics and better class separation.
52
+
53
+ For optimal training performance, GPU acceleration is strongly recommended. A GPU with at least **8 GB of VRAM** is suggested for fine-tuning RoBERTa efficiently, especially when using:
54
+
55
+ - mixed precision training
56
+ - gradient accumulation
57
+ - SWA optimization
58
+ - larger batch sizes
59
+ - transformer-based augmentation pipelines
60
+
61
 
62
  ## 🚀 Project Overview
63
 
 
1531
 
1532
  making the model fully compatible with Hugging Face Transformers.
1533
 
1534
+ Hugging face link: https://huggingface.co/SkyNet-DL/sentiment-roberta
1535
+
1536
 
1537
  ## 🎥 YouTube API Demo
1538
  A full video demonstration of the Sentiment Analysis API is also available on YouTube.