| | --- |
| | license: mit |
| | datasets: |
| | - google/quickdraw |
| | pipeline_tag: image-feature-extraction |
| | --- |
| | |
| | A simple, small-ish network for producing embeddings for black and white binary images. Takes a 32x32 drawing a produces a 64-dimensional embedding. |
| |
|
| | You can see this in action on https://huggingface.co/spaces/JosephCatrambone/tiny_doodle_embedding |
| |
|
| | ## Input Format: |
| |
|
| | The model expects a (b, 32, 32) float32 input, generally with 0.0 being "background" and 1.0 being "foreground", similar to MNIST. |
| | The model is trained with QuickDraw data, and image data being justified to the top-left corner (0,0), so when using the model take steps to align images to the top-left. |
| |
|
| | ## Output: |
| | Given a batch of (b, 32, 32), the model will produce a normalized (b, 64) matrix of floats. |
| |
|
| | ## Sample usage: |
| |
|
| | ``` |
| | import onnxruntime as ort |
| | import numpy |
| | |
| | ort_sess = ort.InferenceSession('tiny_doodle_embedding.onnx') |
| | |
| | def compare(input_img_a, input_img_b): |
| | img_a = process_input(input_img_a) # Crop and resize the input image so it's binary and fits in a 32x32 array. |
| | img_b = process_input(input_img_b) |
| | |
| | a_embedding = ort_sess.run(None, {'input': img_a.astype(numpy.float32)})[0] |
| | b_embedding = ort_sess.run(None, {'input': img_b.astype(numpy.float32)})[0] |
| | |
| | sim = numpy.dot(a_embedding , b_embedding.T) # Or a_embedding @ b_embedding.T |
| | ``` |
| |
|
| | ## Training Details: |
| |
|
| | This model was trained on images taken from the Google QuickDraw dataset, rasterized to 32x32 binary images. Augmentations were basic, consisting of noise and an occasional dilation. |
| |
|
| | The model was trained for 100 epochs on a consumer-grade nVidia 3090. |
| |
|
| | Details of the run are visible at https://wandb.ai/josephc/tiny_doodle_model/runs/7wqz4w7g?nw=nwuserjosephc |
| |
|
| | ## Power Use and Environmental Considerations: |
| |
|
| | The model consumed 120W for a duration of 570 minutes for training the final version. Excess heat from the training process was used to heat the home of the author in place of gas heating. |
| |
|