Spaces:

Asim02
/

14class_scriptseperation

No application file

App Files Files Community

14class_scriptseperation / README (3).md

Asim02

Upload 3 files

b82820a verified 7 months ago

preview code

raw

history blame contribute delete

1.84 kB

	# ✍️ Printed Word-Level Script Identification Multi-class (14-Class Model)

	initial model for Printed document word-level script separation across 13 Indic languages + English.
	The model is designed to classify word images into their respective script categories.
	i.e. `Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Punjabi, Tamil, Telugu, Urdu, odia`

	---

	## 📊 Dataset Overview

	- Training samples: ~650560
	- Validation samples: ~95909
	- (Test set used for evaluation)

	---

	## ⚙️ Training Setup

	- Model: ResNet-18
	- Preprocessing: Custom binarization function applied for improved feature extraction
	- Input size: 224 × 224 RGB
	- Optimizer: Adam
	- Loss function: CrossEntropyLoss
	- Epochs: model trained up to 35th epoch (weights shared)

	---

	## 📈 Results & Evaluation

	The model was evaluated on the test set.
	Accompanying this README, you will find PNG visualizations for:

	- Confusion Matrix
	- Per-class Precision, Recall, F1-Score
	- Support vs Correct Predictions per class
	- Top Misclassifications

	These provide a detailed breakdown of model performance across all 14 classes.

	---

	## 📂 Included Files

	- `model_weights/` → Trained ResNet-18 weights
	- `wt_35_test_report/` → Evaluation visualizations (confusion matrix, metrics, misclassifications, etc.)
	- `test.py` → Script used to run evaluation

	---

	## 🗂️ Class Labels

	The model predicts among 14 classes:

	`Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Punjabi, Tamil, Telugu, Urdu, Odia`

	---

	## 📝 Note

	This is an initial baseline model trained.
	Further improvements can be made by training on the complete dataset and tuning hyperparameters.