itsgokul02 commited on
Commit
ddb68af
·
verified ·
1 Parent(s): 6565c1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +150 -1
README.md CHANGED
@@ -12,4 +12,153 @@ tags:
12
  - pytorch
13
  - cnn
14
  - mediapipe
15
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - pytorch
13
  - cnn
14
  - mediapipe
15
+ ---
16
+ # Model Card for Model ID
17
+
18
+ This model is a fine-tuned EfficientNet-B0 Convolutional Neural Network (CNN) designed to recognize hand-drawn letters (A-Z) for a virtual board application. Integrated with OpenCV and MediaPipe for real-time hand tracking, it powers an interactive canvas for letter and word prediction, achieving a hypothetical validation accuracy of 99%. The model is trained on the pittawat/letter_recognition dataset and supports educational and communication use cases with voice feedback via Tesseract OCR.
19
+
20
+ ### Model Description
21
+
22
+ The Virtual Board CNN is a fine-tuned EfficientNet-B0 model for classifying hand-drawn letters (A-Z) in real-time. Built using PyTorch, it processes grayscale images (224x224) from a virtual canvas, enabling gesture-based drawing and prediction. The model is part of an interactive application that combines computer vision (OpenCV, MediaPipe) and deep learning for educational and communication purposes, with word prediction enhanced by Tesseract OCR and text-to-speech output.
23
+
24
+
25
+ - **Developed by:** Gokul Seetharaman
26
+ - **Model type:** Convolutional Neural Network
27
+ - **License:** MIT
28
+ - **Finetuned from model** EfficientB0
29
+
30
+ ### Model Sources [optional]
31
+
32
+ - **Repository:** https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch
33
+ - **Dataset:** https://huggingface.co/datasets/pittawat/letter_recognition
34
+
35
+ ## Uses
36
+
37
+ The model is intended for direct use within the virtual board application, where it predicts hand-drawn letters (A-Z) from webcam-captured canvas images. Users draw letters using hand gestures, and the model outputs predictions in real-time, displayed on the interface with confidence scores.
38
+ ### Direct Use
39
+
40
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
41
+
42
+ [More Information Needed]
43
+
44
+ ### Downstream Use [optional]
45
+
46
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
47
+
48
+ [More Information Needed]
49
+
50
+ ### Out-of-Scope Use
51
+
52
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
53
+
54
+ [More Information Needed]
55
+
56
+ ## Bias, Risks, and Limitations
57
+
58
+ Bias: The model was trained on the pittawat/letter_recognition dataset, which may not capture all handwriting styles or variations across demographics, potentially leading to lower accuracy for underrepresented writing patterns.
59
+
60
+ Risks: Incorrect letter predictions could mislead users in educational or communication contexts. Word prediction via Tesseract OCR may fail for poorly drawn or complex words.
61
+
62
+ Lmitations:
63
+ Hypothetical 99% validation accuracy is unverified without a formal evaluation script.
64
+ Performance depends on webcam quality (min. 720p recommended) and clear canvas inputs.
65
+ Grayscale input limits applicability to color-based tasks.
66
+ Tesseract OCR’s word prediction may struggle with cursive or overlapping text.
67
+
68
+ ### Recommendations
69
+
70
+ Users should:
71
+ Verify model performance with a validation script (e.g., validation-checker.py) on diverse handwriting samples.
72
+ Ensure high-quality webcam input and clear canvas drawings for optimal results.
73
+ Be aware of potential biases in the dataset and test with varied handwriting styles.
74
+ Consider fine-tuning for specific use cases or hardware constraints.
75
+
76
+ ## How to Get Started with the Model
77
+
78
+ Download best_model.pth and main.py from this repo and GitHub.
79
+ Run python main.py for webcam.
80
+
81
+ ### Training Data
82
+
83
+ * [Huggingface letter recognition dataset](https://huggingface.co/datasets/pittawat/letter_recognition)
84
+ * 26 classes (split 80/20 train/val)
85
+
86
+ ### Training Procedure
87
+
88
+ * Finetunded EfficientB0
89
+ * CrossEntropyLoss, AdamW optimizer, 25 epochs, batch size 32
90
+
91
+ #### Preprocessing \[optional]
92
+
93
+ * Images resized to 224x224
94
+ * Normalized with ImageNet means/std
95
+ * Random data augmentation on train set
96
+
97
+ #### Training Hyperparameters
98
+
99
+ * Training regime: fp32
100
+ * Epochs: 25, batch size: 32, optimizer: AdamW, LR: 5e-4
101
+
102
+ #### Speeds, Sizes, Times \[optional]
103
+
104
+ * Training time: \~90 minutes on a modern GPU (varies)
105
+ * Checkpoint size: \~46MB (`best_model.pth`)
106
+
107
+ #### Factors
108
+
109
+ * Performance measured per-class (precision, recall, F1-score, support)
110
+
111
+ #### Metrics
112
+
113
+ * Overall accuracy, confusion matrix, precision/recall/F1-score per class
114
+
115
+ ### Results
116
+
117
+ * Validation accuracy: **99.04**
118
+ * Full confusion matrix and metrics in [GitHub README](https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch)
119
+
120
+
121
+
122
+ ## Environmental Impact
123
+
124
+ * Estimated training: <1.5 GPU-hour, carbon footprint minimal for local or single-GPU cloud runs
125
+ * Hardware: NVIDIA GeForce GTX 4060 Laptop GPU
126
+ * Hours used: \~1.5
127
+
128
+ ### Model Architecture and Objective
129
+
130
+ * See "Model Details" and [GitHub repo](https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch) for the full PyTorch code.
131
+
132
+ ### Compute Infrastructure
133
+
134
+ * Finetuning the EfficientB0 model with NVIDIA GTX 4060 Laptop GPU, 8GB VRAM, 16GB RAM, Windows 11, Python 3.10
135
+
136
+ #### Hardware
137
+
138
+ * GPU: GTX 4060 (or equivalent, optional CPU)
139
+ * RAM: 16GB
140
+
141
+ #### Software
142
+
143
+ * Python 3.10, PyTorch, OpenCV, NumPy, mediapipe, pyttsx3
144
+
145
+ ## Citation
146
+
147
+ **BibTeX:**
148
+
149
+ ```bibtex
150
+ @misc{gokulseetharaman2025wastecnn,
151
+ title={Virtual-Drawing-Board-Opencv-pytorch},
152
+ author={Gokul Seetharaman},
153
+ year={2025},
154
+ url={https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch}
155
+ }
156
+ ```
157
+
158
+ **APA:**
159
+ Gokul Seetharaman. (2025). Virtual-Drawing-board-Opencv-Pytorch. [https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch](https://github.com/gokulseetharaman/Virtual-Drawing-board-Opencv-Pytorch)
160
+
161
+
162
+ ## Model Card Contact
163
+
164
+ [GitHub Issues](https://github.com/gokulseetharaman/cnn-waste-classification-opencv-pytorch/issues)