# ✍️ Printed Word-Level Script Identification Multi-class (14-Class Model)

**initial model** for Printed document word-level script separation across **13 Indic languages + English**.  
The model is designed to classify word images into their respective script categories.
i.e. `Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Punjabi, Tamil, Telugu, Urdu, odia`
 
---

## 📊 Dataset Overview  

- **Training samples**: ~650560  
- **Validation samples**: ~95909  
- (Test set used for evaluation)  

---

## ⚙️ Training Setup  

- **Model**: ResNet-18  
- **Preprocessing**: Custom **binarization function** applied for improved feature extraction  
- **Input size**: 224 × 224 RGB  
- **Optimizer**: Adam  
- **Loss function**: CrossEntropyLoss  
- **Epochs**: model trained up to 35th epoch (weights shared)  

---

## 📈 Results & Evaluation  

The model was evaluated on the **test set**.  
Accompanying this README, you will find PNG visualizations for:  

- Confusion Matrix  
- Per-class Precision, Recall, F1-Score  
- Support vs Correct Predictions per class  
- Top Misclassifications  

These provide a detailed breakdown of model performance across all 14 classes.

---

## 📂 Included Files  

- `model_weights/` → Trained ResNet-18 weights    
- `wt_35_test_report/` → Evaluation visualizations (confusion matrix, metrics, misclassifications, etc.)  
- `test.py` → Script used to run evaluation  

---

## 🗂️ Class Labels  

The model predicts among **14 classes**:  

`Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Punjabi, Tamil, Telugu, Urdu, Odia`  

---

## 📝 Note  

This is an **initial baseline model** trained.  
Further improvements can be made by training on the complete dataset and tuning hyperparameters.