language: as
tags:
- sentiment-analysis
- assamese
- transformers
- text-classification
license: apache-2.0
datasets:
- None
model-index:
- name: assamese-sentiment-analysis
results: []
🌟 Assamese Sentiment Analysis with LSTM
Tags: #text-classification #sentiment-analysis #Assamese #LSTM
A deep learning-powered tool to classify Assamese text as Positive, Negative, or Neutral using an LSTM model tailored for the Assamese language.
🚀 Key Features
- 🔍 Sentiment Analysis for Assamese – Supports full sentiment classification of Assamese text
- 🧠 Deep Learning Backbone – Powered by TensorFlow/Keras with a Long Short-Term Memory (LSTM) network
- ✨ Advanced Preprocessing – Includes tokenization, text cleaning, optional stemming, and stopword removal
- 🧰 Custom Tokenization – Leverages AssameseTokenizer for accurate language handling
- 📈 Robust Evaluation Metrics – F1-score, precision, recall, and accuracy
🧠 Model Overview
| Property | Details |
|---|---|
| Model Name | pratyushee/assamese-sentiment-analysis |
| Architecture | Pretrained LSTM-based neural network |
| Language | Assamese (অসমীয়া) |
| Classes | 3 – Positive, Neutral, Negative |
| Use Cases | Customer feedback, social media monitoring, opinion mining |
🧪 Installation & Requirements
Clone the repo and install the requirements:
pip install -r requirements.txt
Install the custom Assamese tokenizer:
git clone https://github.com/KashyapKishore/AssameseTokenizer.git
cd AssameseTokenizer
pip install .
⚙️ Model Description
This model was developed using Assamese text data and trained with a custom tokenizer specifically designed for Assamese script. It uses an LSTM architecture, making it well-suited for capturing the sequence and context of natural language in sentiment classification tasks.
📚 Training Data The dataset was curated from public sources such as news articles, social media comments, and feedback forms, and was manually labeled into three sentiment classes: Positive, Neutral, and Negative.
🏋️ Training Procedure
✂️ Preprocessing: Text cleaning, tokenization using AssameseTokenizer, optional stemming and stopword removal
🔢 Input Handling: Sequences padded or truncated to a fixed length of 512 tokens
🧠 Architecture: Embedding layer → LSTM → Dense (Softmax)
💧 Regularization: Dropout layers to prevent overfitting
⚙️ Optimizer: Adam
🔁 Epochs: Trained for X epochs (replace with your actual number)
📊 Evaluation: Final validation accuracy and F1-score: Insert actual metrics here
📦 Intended Usage
Ideal for:
🗨️ Social media sentiment tracking in Assamese
📢 Public opinion & brand monitoring
📚 Research on low-resource NLP in Indic languages
⚠️ Limitations / Not Recommended For:
Code-mixed Assamese-English input
Domain-specific texts (e.g., legal, medical) without additional fine-tuning
🧪 Quickstart: Using the Model
You can load and run the model easily via Hugging Face's transformers pipeline:
from transformers import pipeline
model_name = "pratyushee/assamese-sentiment-analysis"
pipe = pipeline("text-classification", model=model_name, tokenizer=model_name)
result = pipe("এই খাবাৰটা একদম ভালো আছিল!") # Sample Assamese sentence
print(result)