--- language: en license: apache-2.0 tags: - nlp - text-classification - political-analysis - social-media-analysis - transformers - research pipeline_tag: text-classification --- # the_poli **the_poli** is a transformer-based NLP classification model developed as part of the **s0m3m0** research project. The model is designed to analyse political and socio-political text, primarily from online and social media sources, and generate structured predictions for analytical and experimental purposes. This repository contains **only the trained model artifacts** (weights, configuration, and tokenizer files). The full data pipeline and application code are maintained separately. --- ## Model Overview - **Model type:** Transformer-based text classification - **Framework:** Hugging Face Transformers - **Primary language:** English - **Domain:** Political and social media text - **Use case:** Research, analysis, and experimentation The model is intended to assist in identifying patterns and signals in text rather than making authoritative judgments. --- ## Intended Use The model is suitable for: - Academic and research-based NLP experiments - Political and social discourse analysis - Text classification pipeline prototyping - Educational demonstrations of NLP systems ### Not Intended For - Political persuasion or targeting - Surveillance or profiling of individuals - Automated decision-making in real-world political contexts - High-stakes or safety-critical applications --- ## Example Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification model_id = "d42kw01f/the_poli" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id) text = "Example political statement for analysis" inputs = tokenizer(text, return_tensors="pt", truncation=True) outputs = model(**inputs) ``` ## Training Data - Trained on curated datasets derived from **publicly available sources** - Data was preprocessed and filtered for research purposes - No private, sensitive, or non-consensual data was intentionally included > Dataset details are intentionally limited to reduce misuse risk. --- ## Limitations & Bias - Model performance depends on the quality and balance of the training data - May reflect biases present in source datasets - Not robust to domain shifts, sarcasm, or adversarial input - Outputs should be treated as **probabilistic signals**, not factual conclusions --- ## Ethical Considerations This model is released **strictly for research and educational use**. Users are responsible for: - Ensuring ethical deployment - Respecting platform terms of service - Avoiding harmful, misleading, or manipulative applications --- ## Related Project - **Code repository:** https://github.com/d42kw01f/s0m3m0 - **Project name:** s0m3m0 --- ## Author **Dakshitha Navodya Perera** AI • Cybersecurity • Data Engineering Sri Lanka