File size: 4,290 Bytes
0d8c41a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9a7d63d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6164e75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46d9ae8
6164e75
46d9ae8
6164e75
46d9ae8
6164e75
46d9ae8
6164e75
46d9ae8
6164e75
46d9ae8
6164e75
46d9ae8
6164e75
 
 
 
 
 
46d9ae8
6164e75
46d9ae8
6164e75
46d9ae8
6164e75
46d9ae8
6164e75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
372c06c
 
9c9cc61
372c06c
2525f31
c5a6f93
 
2525f31
372c06c
 
8b3ea97
372c06c
2525f31
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
language: as
tags:
  - sentiment-analysis
  - assamese
  - transformers
  - text-classification
license: apache-2.0
datasets:
  - None
model-index:
  - name: assamese-sentiment-analysis
    results: []
---


# 🌟 Assamese Sentiment Analysis with LSTM  
**Tags:** `#text-classification` `#sentiment-analysis` `#Assamese` `#LSTM`

> A deep learning-powered tool to classify Assamese text as **Positive**, **Negative**, or **Neutral** using an LSTM model tailored for the Assamese language.  

---

## 🚀 Key Features

- 🔍 **Sentiment Analysis for Assamese** – Supports full sentiment classification of Assamese text  
- 🧠 **Deep Learning Backbone** – Powered by TensorFlow/Keras with a Long Short-Term Memory (LSTM) network  
-**Advanced Preprocessing** – Includes tokenization, text cleaning, optional stemming, and stopword removal  
- 🧰 **Custom Tokenization** – Leverages [AssameseTokenizer](https://github.com/KashyapKishore/AssameseTokenizer.git) for accurate language handling  
- 📈 **Robust Evaluation Metrics** – F1-score, precision, recall, and accuracy  

---

## 🧠 Model Overview

| Property            | Details                                         |
|---------------------|--------------------------------------------------|
| **Model Name**      | `pratyushee/assamese-sentiment-analysis`         |
| **Architecture**    | Pretrained LSTM-based neural network             |
| **Language**        | Assamese (অসমীয়া)                               |
| **Classes**         | 3 – Positive, Neutral, Negative                  |
| **Use Cases**       | Customer feedback, social media monitoring, opinion mining |

---

## 🧪 Installation & Requirements

Clone the repo and install the requirements:

```bash
pip install -r requirements.txt
```

Install the custom Assamese tokenizer:

```bash
git clone https://github.com/KashyapKishore/AssameseTokenizer.git
cd AssameseTokenizer
pip install .
```
-----

## ⚙️ Model Description
This model was developed using Assamese text data and trained with a custom tokenizer specifically designed for Assamese script. It uses an LSTM architecture, making it well-suited for capturing the sequence and context of natural language in sentiment classification tasks.

- 📚 Training Data
The dataset was curated from public sources such as news articles, social media comments, and feedback forms, and was manually labeled into three sentiment classes: Positive, Neutral, and Negative.

- 🏋️ Training Procedure
- ✂️ Preprocessing: Text cleaning, tokenization using AssameseTokenizer, optional stemming and stopword removal

- 🔢 Input Handling: Sequences padded or truncated to a fixed length of 512 tokens

- 🧠 Architecture: Embedding layer → LSTM → Dense (Softmax)

- 💧 Regularization: Dropout layers to prevent overfitting

- ⚙️ Optimizer: Adam

- 🔁 Epochs: Trained for X epochs (replace with your actual number)

- 📊 Evaluation: Final validation accuracy and F1-score: Insert actual metrics here

---

## 📦 Intended Usage
Ideal for:

- 🗨️ Social media sentiment tracking in Assamese

- 📢 Public opinion & brand monitoring

- 📚 Research on low-resource NLP in Indic languages

- ⚠️ Limitations / Not Recommended For:

Code-mixed Assamese-English input

Domain-specific texts (e.g., legal, medical) without additional fine-tuning

---

## 🧪 Quickstart: Using the Model

You can load and run the model easily via Hugging Face's transformers pipeline:

```bash
from transformers import pipeline

model_name = "pratyushee/assamese-sentiment-analysis"
pipe = pipeline("text-classification", model=model_name, tokenizer=model_name)

result = pipe("এই খাবাৰটা একদম ভালো আছিল!")  # Sample Assamese sentence
print(result)
```

----
## 📚 Reference Citations

- [E. Grave*, P. Bojanowski*, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages](https://arxiv.org/abs/1802.06893)


- [Assamese Tokenizer](https://github.com/KashyapKishore/AssameseTokenizer.git) 

---
## 🤝 In Collaboration with

- [Angshita Kashyap](https://huggingface.co/angshita)

  
- [Dhiraj Ballav Saikia](https://huggingface.co/dhiraj04)

  
- [Niharika Nath](https://huggingface.co/niharikanath)