itsgokul02
/

Virtual_Board

 - pytorch
 - cnn
 - mediapipe
+---
+# Model Card for Model ID
+This model is a fine-tuned EfficientNet-B0 Convolutional Neural Network (CNN) designed to recognize hand-drawn letters (A-Z) for a virtual board application. Integrated with OpenCV and MediaPipe for real-time hand tracking, it powers an interactive canvas for letter and word prediction, achieving a hypothetical validation accuracy of 99%. The model is trained on the pittawat/letter_recognition dataset and supports educational and communication use cases with voice feedback via Tesseract OCR.
+### Model Description
+The Virtual Board CNN is a fine-tuned EfficientNet-B0 model for classifying hand-drawn letters (A-Z) in real-time. Built using PyTorch, it processes grayscale images (224x224) from a virtual canvas, enabling gesture-based drawing and prediction. The model is part of an interactive application that combines computer vision (OpenCV, MediaPipe) and deep learning for educational and communication purposes, with word prediction enhanced by Tesseract OCR and text-to-speech output.
+- **Developed by:** Gokul Seetharaman
+- **Model type:** Convolutional Neural Network
+- **License:** MIT
+- **Finetuned from model** EfficientB0
+### Model Sources [optional]
+- **Repository:** https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch
+- **Dataset:** https://huggingface.co/datasets/pittawat/letter_recognition
+## Uses
+The model is intended for direct use within the virtual board application, where it predicts hand-drawn letters (A-Z) from webcam-captured canvas images. Users draw letters using hand gestures, and the model outputs predictions in real-time, displayed on the interface with confidence scores.
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+Bias: The model was trained on the pittawat/letter_recognition dataset, which may not capture all handwriting styles or variations across demographics, potentially leading to lower accuracy for underrepresented writing patterns.
+Risks: Incorrect letter predictions could mislead users in educational or communication contexts. Word prediction via Tesseract OCR may fail for poorly drawn or complex words.
+Lmitations:
+    Hypothetical 99% validation accuracy is unverified without a formal evaluation script.
+    Performance depends on webcam quality (min. 720p recommended) and clear canvas inputs.
+    Grayscale input limits applicability to color-based tasks.
+    Tesseract OCR’s word prediction may struggle with cursive or overlapping text.
+### Recommendations
+Users should:
+      Verify model performance with a validation script (e.g., validation-checker.py) on diverse handwriting samples.
+      Ensure high-quality webcam input and clear canvas drawings for optimal results.
+      Be aware of potential biases in the dataset and test with varied handwriting styles.
+      Consider fine-tuning for specific use cases or hardware constraints.
+## How to Get Started with the Model
+Download best_model.pth and main.py from this repo and GitHub.
+Run python main.py for webcam.
+### Training Data
+* [Huggingface letter recognition dataset](https://huggingface.co/datasets/pittawat/letter_recognition)
+* 26 classes (split 80/20 train/val)
+### Training Procedure
+* Finetunded EfficientB0
+* CrossEntropyLoss, AdamW optimizer, 25 epochs, batch size 32
+#### Preprocessing \[optional]
+* Images resized to 224x224
+* Normalized with ImageNet means/std
+* Random data augmentation on train set
+#### Training Hyperparameters
+* Training regime: fp32
+* Epochs: 25, batch size: 32, optimizer: AdamW, LR: 5e-4
+#### Speeds, Sizes, Times \[optional]
+* Training time: \~90 minutes on a modern GPU (varies)
+* Checkpoint size: \~46MB (`best_model.pth`)
+#### Factors
+* Performance measured per-class (precision, recall, F1-score, support)
+#### Metrics
+* Overall accuracy, confusion matrix, precision/recall/F1-score per class
+### Results
+* Validation accuracy: **99.04**
+* Full confusion matrix and metrics in [GitHub README](https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch)
+## Environmental Impact
+* Estimated training: <1.5 GPU-hour, carbon footprint minimal for local or single-GPU cloud runs
+* Hardware: NVIDIA GeForce GTX 4060 Laptop GPU
+* Hours used: \~1.5
+### Model Architecture and Objective
+* See "Model Details" and [GitHub repo](https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch) for the full PyTorch code.
+### Compute Infrastructure
+* Finetuning the EfficientB0 model with NVIDIA GTX 4060 Laptop GPU, 8GB VRAM, 16GB RAM, Windows 11, Python 3.10
+#### Hardware
+* GPU: GTX 4060 (or equivalent, optional CPU)
+* RAM: 16GB
+#### Software
+* Python 3.10, PyTorch, OpenCV, NumPy, mediapipe, pyttsx3
+## Citation
+**BibTeX:**
+```bibtex
+@misc{gokulseetharaman2025wastecnn,
+  title={Virtual-Drawing-Board-Opencv-pytorch},
+  author={Gokul Seetharaman},
+  year={2025},
+  url={https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch}
+}
+```
+**APA:**
+Gokul Seetharaman. (2025). Virtual-Drawing-board-Opencv-Pytorch. [https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch](https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch)
+## Model Card Contact
+[GitHub Issues](https://github.com/gokulseetharaman/cnn-waste-classification-opencv-pytorch/issues)