--- language: as tags: - sentiment-analysis - assamese - transformers - text-classification license: apache-2.0 datasets: - None model-index: - name: assamese-sentiment-analysis results: [] --- # 🌟 Assamese Sentiment Analysis with LSTM **Tags:** `#text-classification` `#sentiment-analysis` `#Assamese` `#LSTM` > A deep learning-powered tool to classify Assamese text as **Positive**, **Negative**, or **Neutral** using an LSTM model tailored for the Assamese language. --- ## 🚀 Key Features - 🔍 **Sentiment Analysis for Assamese** – Supports full sentiment classification of Assamese text - 🧠 **Deep Learning Backbone** – Powered by TensorFlow/Keras with a Long Short-Term Memory (LSTM) network - ✨ **Advanced Preprocessing** – Includes tokenization, text cleaning, optional stemming, and stopword removal - 🧰 **Custom Tokenization** – Leverages [AssameseTokenizer](https://github.com/KashyapKishore/AssameseTokenizer.git) for accurate language handling - 📈 **Robust Evaluation Metrics** – F1-score, precision, recall, and accuracy --- ## 🧠 Model Overview | Property | Details | |---------------------|--------------------------------------------------| | **Model Name** | `pratyushee/assamese-sentiment-analysis` | | **Architecture** | Pretrained LSTM-based neural network | | **Language** | Assamese (অসমীয়া) | | **Classes** | 3 – Positive, Neutral, Negative | | **Use Cases** | Customer feedback, social media monitoring, opinion mining | --- ## 🧪 Installation & Requirements Clone the repo and install the requirements: ```bash pip install -r requirements.txt ``` Install the custom Assamese tokenizer: ```bash git clone https://github.com/KashyapKishore/AssameseTokenizer.git cd AssameseTokenizer pip install . ``` ----- ## ⚙️ Model Description This model was developed using Assamese text data and trained with a custom tokenizer specifically designed for Assamese script. It uses an LSTM architecture, making it well-suited for capturing the sequence and context of natural language in sentiment classification tasks. - 📚 Training Data The dataset was curated from public sources such as news articles, social media comments, and feedback forms, and was manually labeled into three sentiment classes: Positive, Neutral, and Negative. - 🏋️ Training Procedure - ✂️ Preprocessing: Text cleaning, tokenization using AssameseTokenizer, optional stemming and stopword removal - 🔢 Input Handling: Sequences padded or truncated to a fixed length of 512 tokens - 🧠 Architecture: Embedding layer → LSTM → Dense (Softmax) - 💧 Regularization: Dropout layers to prevent overfitting - ⚙️ Optimizer: Adam - 🔁 Epochs: Trained for X epochs (replace with your actual number) - 📊 Evaluation: Final validation accuracy and F1-score: Insert actual metrics here --- ## 📦 Intended Usage Ideal for: - 🗨️ Social media sentiment tracking in Assamese - 📢 Public opinion & brand monitoring - 📚 Research on low-resource NLP in Indic languages - ⚠️ Limitations / Not Recommended For: Code-mixed Assamese-English input Domain-specific texts (e.g., legal, medical) without additional fine-tuning --- ## 🧪 Quickstart: Using the Model You can load and run the model easily via Hugging Face's transformers pipeline: ```bash from transformers import pipeline model_name = "pratyushee/assamese-sentiment-analysis" pipe = pipeline("text-classification", model=model_name, tokenizer=model_name) result = pipe("এই খাবাৰটা একদম ভালো আছিল!") # Sample Assamese sentence print(result) ``` ---- ## 📚 Reference Citations - [E. Grave*, P. Bojanowski*, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages](https://arxiv.org/abs/1802.06893) - [Assamese Tokenizer](https://github.com/KashyapKishore/AssameseTokenizer.git) --- ## 🤝 In Collaboration with - [Angshita Kashyap](https://huggingface.co/angshita) - [Dhiraj Ballav Saikia](https://huggingface.co/dhiraj04) - [Niharika Nath](https://huggingface.co/niharikanath)