# ✍️ Printed Word-Level Script Identification Multi-class (14-Class Model) **initial model** for Printed document word-level script separation across **13 Indic languages + English**. The model is designed to classify word images into their respective script categories. i.e. `Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Punjabi, Tamil, Telugu, Urdu, odia` --- ## 📊 Dataset Overview - **Training samples**: ~650560 - **Validation samples**: ~95909 - (Test set used for evaluation) --- ## ⚙️ Training Setup - **Model**: ResNet-18 - **Preprocessing**: Custom **binarization function** applied for improved feature extraction - **Input size**: 224 × 224 RGB - **Optimizer**: Adam - **Loss function**: CrossEntropyLoss - **Epochs**: model trained up to 35th epoch (weights shared) --- ## 📈 Results & Evaluation The model was evaluated on the **test set**. Accompanying this README, you will find PNG visualizations for: - Confusion Matrix - Per-class Precision, Recall, F1-Score - Support vs Correct Predictions per class - Top Misclassifications These provide a detailed breakdown of model performance across all 14 classes. --- ## 📂 Included Files - `model_weights/` → Trained ResNet-18 weights - `wt_35_test_report/` → Evaluation visualizations (confusion matrix, metrics, misclassifications, etc.) - `test.py` → Script used to run evaluation --- ## 🗂️ Class Labels The model predicts among **14 classes**: `Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Punjabi, Tamil, Telugu, Urdu, Odia` --- ## 📝 Note This is an **initial baseline model** trained. Further improvements can be made by training on the complete dataset and tuning hyperparameters.