| # Digit Recognition | |
| ## Intended Use | |
| This model is designed to classify handwritten digits (0-9) based on pixel values from the MNIST-like dataset. It is intended for educational purposes and to demonstrate the use of Random Forest for multi-class classification. | |
| ## Training Data | |
| - **Dataset**: The model was trained on a dataset with 42,000 samples, where each sample is a 28x28 grayscale image flattened into a vector of 784 pixel values. | |
| - **Labels**: The dataset contains 10 classes (digits 0-9). | |
| - **Train-Test Split**: | |
| - Training set: 33,600 samples (80%) | |
| - Validation set: 8,400 samples (20%) | |
| ## Evaluation Metrics | |
| - **Accuracy**: The model achieved an accuracy of approximately `accuracy_score(y_val, y_pred)` on the validation set. | |
| - **Classification Report**: Includes precision, recall, and F1-score for each class. | |
| - **Confusion Matrix**: Visualized to show the distribution of predictions across classes. | |
| ## Limitations | |
| - The model may not generalize well to digits written in styles significantly different from the training data. | |
| - It is not optimized for real-time or large-scale applications. | |
| ## Ethical Considerations | |
| - Ensure the dataset used does not contain any biases that could affect the fairness of the model. | |
| - The model should not be used in critical applications without further validation and testing. | |
| ## How to Use | |
| 1. Load the model using `joblib.load('digit_rf_model.joblib')`. | |
| 2. Preprocess the input data to match the format of the training data (28x28 images flattened into 784-pixel vectors). | |
| 3. Use the `predict` method to classify new samples. |