sentiment-analysis-social-media
This model was trained using influence-guided dataset selection, a technique that uses influence scores to identify the most impactful training data for specific concepts.
Model Description
- Base Model: distilbert/distilgpt2
- Training Concepts: sentiment analysis, NLP, social media, opinion mining, text classification
- Training Method: Influence-guided data selection
- Compute Budget: 5 steps per condition
- Total Datasets: 5
Training Approach
This model was trained using three different data selection strategies to validate the effectiveness of influence-guided training:
- Positive Influence: Datasets with high positive influence scores (most aligned with target concepts)
- Random Baseline: Randomly sampled datasets
- Negative Influence: Datasets with high negative influence scores (least aligned)
Benchmark Results
| Condition | Perplexity ↓ | Train Loss ↓ | Eval Loss ↓ |
|---|---|---|---|
| Positive | 6.97 | 2.2737 | 1.9416 |
| Random | 218.96 | 6.0215 | 5.3889 |
| Negative | 7.63 | 2.1997 | 2.0321 |
Lower is better for all metrics
Training Datasets
The model was trained on datasets selected through influence scoring:
OdiaGenAI/sentiment_analysis_hindi(Influence: -1.749)yassiracharki/Amazon_Reviews_Binary_for_Sentiment_Analysis(Influence: 0.785)princeton-nlp/SWE-bench(Influence: 0.770)racro/sentiment-analysis-finetune(Influence: -0.761)maydogan/Turkish_SentimentAnalysis_TRSAv1(Influence: -0.380)
Intended Use
This model demonstrates the effectiveness of influence-guided training for:
- Concept-specific language modeling
- Data-efficient training
- Dataset curation research
Limitations
- Trained on a limited compute budget for benchmarking purposes
- May not generalize well outside the target concepts: sentiment analysis, NLP, social media, opinion mining, text classification
- Performance depends on the quality of influence score estimation
Citation
If you use this model or the influence-guided training approach, please cite:
@software{influence_guided_training,
title = {Influence-Guided Dataset Selection for Language Models},
author = {Dowser by Durinn},
year = {2025},
url = {https://huggingface.co/vstrandmoe/sentiment-analysis-social-media}
}
Model Card Contact
For questions or feedback, visit Durinn
Generated by Dowser - Dataset discovery and training optimization
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support