File size: 4,290 Bytes
0d8c41a 9a7d63d 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 46d9ae8 6164e75 372c06c 9c9cc61 372c06c 2525f31 c5a6f93 2525f31 372c06c 8b3ea97 372c06c 2525f31 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
language: as
tags:
- sentiment-analysis
- assamese
- transformers
- text-classification
license: apache-2.0
datasets:
- None
model-index:
- name: assamese-sentiment-analysis
results: []
---
# 🌟 Assamese Sentiment Analysis with LSTM
**Tags:** `#text-classification` `#sentiment-analysis` `#Assamese` `#LSTM`
> A deep learning-powered tool to classify Assamese text as **Positive**, **Negative**, or **Neutral** using an LSTM model tailored for the Assamese language.
---
## 🚀 Key Features
- 🔍 **Sentiment Analysis for Assamese** – Supports full sentiment classification of Assamese text
- 🧠 **Deep Learning Backbone** – Powered by TensorFlow/Keras with a Long Short-Term Memory (LSTM) network
- ✨ **Advanced Preprocessing** – Includes tokenization, text cleaning, optional stemming, and stopword removal
- 🧰 **Custom Tokenization** – Leverages [AssameseTokenizer](https://github.com/KashyapKishore/AssameseTokenizer.git) for accurate language handling
- 📈 **Robust Evaluation Metrics** – F1-score, precision, recall, and accuracy
---
## 🧠 Model Overview
| Property | Details |
|---------------------|--------------------------------------------------|
| **Model Name** | `pratyushee/assamese-sentiment-analysis` |
| **Architecture** | Pretrained LSTM-based neural network |
| **Language** | Assamese (অসমীয়া) |
| **Classes** | 3 – Positive, Neutral, Negative |
| **Use Cases** | Customer feedback, social media monitoring, opinion mining |
---
## 🧪 Installation & Requirements
Clone the repo and install the requirements:
```bash
pip install -r requirements.txt
```
Install the custom Assamese tokenizer:
```bash
git clone https://github.com/KashyapKishore/AssameseTokenizer.git
cd AssameseTokenizer
pip install .
```
-----
## ⚙️ Model Description
This model was developed using Assamese text data and trained with a custom tokenizer specifically designed for Assamese script. It uses an LSTM architecture, making it well-suited for capturing the sequence and context of natural language in sentiment classification tasks.
- 📚 Training Data
The dataset was curated from public sources such as news articles, social media comments, and feedback forms, and was manually labeled into three sentiment classes: Positive, Neutral, and Negative.
- 🏋️ Training Procedure
- ✂️ Preprocessing: Text cleaning, tokenization using AssameseTokenizer, optional stemming and stopword removal
- 🔢 Input Handling: Sequences padded or truncated to a fixed length of 512 tokens
- 🧠 Architecture: Embedding layer → LSTM → Dense (Softmax)
- 💧 Regularization: Dropout layers to prevent overfitting
- ⚙️ Optimizer: Adam
- 🔁 Epochs: Trained for X epochs (replace with your actual number)
- 📊 Evaluation: Final validation accuracy and F1-score: Insert actual metrics here
---
## 📦 Intended Usage
Ideal for:
- 🗨️ Social media sentiment tracking in Assamese
- 📢 Public opinion & brand monitoring
- 📚 Research on low-resource NLP in Indic languages
- ⚠️ Limitations / Not Recommended For:
Code-mixed Assamese-English input
Domain-specific texts (e.g., legal, medical) without additional fine-tuning
---
## 🧪 Quickstart: Using the Model
You can load and run the model easily via Hugging Face's transformers pipeline:
```bash
from transformers import pipeline
model_name = "pratyushee/assamese-sentiment-analysis"
pipe = pipeline("text-classification", model=model_name, tokenizer=model_name)
result = pipe("এই খাবাৰটা একদম ভালো আছিল!") # Sample Assamese sentence
print(result)
```
----
## 📚 Reference Citations
- [E. Grave*, P. Bojanowski*, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages](https://arxiv.org/abs/1802.06893)
- [Assamese Tokenizer](https://github.com/KashyapKishore/AssameseTokenizer.git)
---
## 🤝 In Collaboration with
- [Angshita Kashyap](https://huggingface.co/angshita)
- [Dhiraj Ballav Saikia](https://huggingface.co/dhiraj04)
- [Niharika Nath](https://huggingface.co/niharikanath) |