|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- nlp |
|
|
- text-classification |
|
|
- political-analysis |
|
|
- social-media-analysis |
|
|
- transformers |
|
|
- research |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# the_poli |
|
|
|
|
|
**the_poli** is a transformer-based NLP classification model developed as part of the **s0m3m0** research project. |
|
|
The model is designed to analyse political and socio-political text, primarily from online and social media sources, and generate structured predictions for analytical and experimental purposes. |
|
|
|
|
|
This repository contains **only the trained model artifacts** (weights, configuration, and tokenizer files). |
|
|
The full data pipeline and application code are maintained separately. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
- **Model type:** Transformer-based text classification |
|
|
- **Framework:** Hugging Face Transformers |
|
|
- **Primary language:** English |
|
|
- **Domain:** Political and social media text |
|
|
- **Use case:** Research, analysis, and experimentation |
|
|
|
|
|
The model is intended to assist in identifying patterns and signals in text rather than making authoritative judgments. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
The model is suitable for: |
|
|
|
|
|
- Academic and research-based NLP experiments |
|
|
- Political and social discourse analysis |
|
|
- Text classification pipeline prototyping |
|
|
- Educational demonstrations of NLP systems |
|
|
|
|
|
### Not Intended For |
|
|
|
|
|
- Political persuasion or targeting |
|
|
- Surveillance or profiling of individuals |
|
|
- Automated decision-making in real-world political contexts |
|
|
- High-stakes or safety-critical applications |
|
|
|
|
|
--- |
|
|
|
|
|
## Example Usage |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
|
|
model_id = "d42kw01f/the_poli" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_id) |
|
|
|
|
|
text = "Example political statement for analysis" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True) |
|
|
outputs = model(**inputs) |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
- Trained on curated datasets derived from **publicly available sources** |
|
|
- Data was preprocessed and filtered for research purposes |
|
|
- No private, sensitive, or non-consensual data was intentionally included |
|
|
|
|
|
> Dataset details are intentionally limited to reduce misuse risk. |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations & Bias |
|
|
- Model performance depends on the quality and balance of the training data |
|
|
- May reflect biases present in source datasets |
|
|
- Not robust to domain shifts, sarcasm, or adversarial input |
|
|
- Outputs should be treated as **probabilistic signals**, not factual conclusions |
|
|
|
|
|
--- |
|
|
|
|
|
## Ethical Considerations |
|
|
This model is released **strictly for research and educational use**. |
|
|
Users are responsible for: |
|
|
- Ensuring ethical deployment |
|
|
- Respecting platform terms of service |
|
|
- Avoiding harmful, misleading, or manipulative applications |
|
|
|
|
|
--- |
|
|
|
|
|
## Related Project |
|
|
- **Code repository:** https://github.com/d42kw01f/s0m3m0 |
|
|
- **Project name:** s0m3m0 |
|
|
|
|
|
--- |
|
|
|
|
|
## Author |
|
|
**Dakshitha Navodya Perera** |
|
|
AI • Cybersecurity • Data Engineering |
|
|
Sri Lanka |