SkyNet-DL
/

sentiment-roberta

Text Classification

sentiment-analysis

Eval Results (legacy)

Model card Files Files and versions

CesarWKR1 commited on 25 days ago

Commit

fc6d0bd

·

verified ·

1 Parent(s): b8d3d37

Add GPU suggestions

Files changed (1) hide show

README.md +58 -0

README.md CHANGED Viewed

@@ -2,6 +2,62 @@
 license: mit
 ---
 ## 🚀 Project Overview
@@ -1475,6 +1531,8 @@ The model repository includes:
 making the model fully compatible with Hugging Face Transformers.
 ## 🎥 YouTube API Demo
 A full video demonstration of the Sentiment Analysis API is also available on YouTube.

 license: mit
 ---
+## 📌 Project Philosophy
+This project intentionally preserves a controlled amount of real-world noise inside the final training dataset instead of aggressively sanitizing every sample.
+The objective was to train the sentiment classifier under realistic social-media conditions, where user-generated content naturally includes:
+- repetitive text
+- malformed sentences
+- slang and informal grammar
+- emotionally chaotic writing
+- duplicated phrases
+- inconsistent punctuation
+- low-quality Reddit comments
+- partially incoherent text
+- noisy conversational patterns
+Examples of preserved noise include:
+- repeated phrases such as: "Avoid being judgmental."
+- incomplete or poorly structured sentences
+- emotionally disorganized long-form Reddit posts
+- imperfect GPT-generated synthetic samples
+- informal internet writing styles
+Rather than building a perfectly clean academic benchmark, the pipeline focuses on creating a model capable of handling imperfect real-world inputs commonly found on social media platforms.
+The preprocessing pipeline still performs:
+- invalid text filtering
+- semantic validation
+- augmentation quality control
+- synthetic sample filtering
+but intentionally avoids over-cleaning the dataset in order to preserve natural language variability.
+This strategy improves:
+- robustness to noisy inputs
+- real-world generalization
+- inference stability
+- tolerance to imperfect user text
+- production-oriented behavior
+The final model was designed to operate under realistic NLP conditions rather than idealized datasets.
+⚠️ Note:
+Even though the model achieved strong performance under noisy conditions, cleaner datasets and more aggressive manual curation could likely produce even higher evaluation metrics and better class separation.
+For optimal training performance, GPU acceleration is strongly recommended. A GPU with at least **8 GB of VRAM** is suggested for fine-tuning RoBERTa efficiently, especially when using:
+- mixed precision training
+- gradient accumulation
+- SWA optimization
+- larger batch sizes
+- transformer-based augmentation pipelines
 ## 🚀 Project Overview
 making the model fully compatible with Hugging Face Transformers.
+Hugging face link: https://huggingface.co/SkyNet-DL/sentiment-roberta
 ## 🎥 YouTube API Demo
 A full video demonstration of the Sentiment Analysis API is also available on YouTube.