JosephCatrambone
/

tiny_doodle_embedding

Image Feature Extraction

Model card Files Files and versions

tiny_doodle_embedding / README.md

JosephCatrambone's picture

JosephCatrambone

Fix typo -> 570 seconds to 570 minutes.

89bb43e verified 10 months ago

|

history blame contribute delete

1.96 kB

	---
	license: mit
	datasets:
	- google/quickdraw
	pipeline_tag: image-feature-extraction
	---

	A simple, small-ish network for producing embeddings for black and white binary images. Takes a 32x32 drawing a produces a 64-dimensional embedding.

	You can see this in action on https://huggingface.co/spaces/JosephCatrambone/tiny_doodle_embedding

	## Input Format:

	The model expects a (b, 32, 32) float32 input, generally with 0.0 being "background" and 1.0 being "foreground", similar to MNIST.
	The model is trained with QuickDraw data, and image data being justified to the top-left corner (0,0), so when using the model take steps to align images to the top-left.

	## Output:
	Given a batch of (b, 32, 32), the model will produce a normalized (b, 64) matrix of floats.

	## Sample usage:

	```
	import onnxruntime as ort
	import numpy

	ort_sess = ort.InferenceSession('tiny_doodle_embedding.onnx')

	def compare(input_img_a, input_img_b):
	img_a = process_input(input_img_a) # Crop and resize the input image so it's binary and fits in a 32x32 array.
	img_b = process_input(input_img_b)

	a_embedding = ort_sess.run(None, {'input': img_a.astype(numpy.float32)})[0]
	b_embedding = ort_sess.run(None, {'input': img_b.astype(numpy.float32)})[0]

	sim = numpy.dot(a_embedding , b_embedding.T) # Or a_embedding @ b_embedding.T
	```

	## Training Details:

	This model was trained on images taken from the Google QuickDraw dataset, rasterized to 32x32 binary images. Augmentations were basic, consisting of noise and an occasional dilation.

	The model was trained for 100 epochs on a consumer-grade nVidia 3090.

	Details of the run are visible at https://wandb.ai/josephc/tiny_doodle_model/runs/7wqz4w7g?nw=nwuserjosephc

	## Power Use and Environmental Considerations:

	The model consumed 120W for a duration of 570 minutes for training the final version. Excess heat from the training process was used to heat the home of the author in place of gas heating.