ProCreations
/

tinyvvision

Zero-Shot Image Classification

Model card Files Files and versions

ProCreations commited on May 14, 2025

Commit

afdc6b5

·

verified ·

1 Parent(s): 49f62df

Update README.md

Files changed (1) hide show

README.md +39 -3

README.md CHANGED Viewed

@@ -1,3 +1,39 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+pipeline_tag: zero-shot-image-classification
+tags:
+- vision
+- simple
+- small
+---
+# tinyvvision 🧠✨
+**tinyvvision** is a compact, synthetic curriculum-trained vision-language model designed to demonstrate real zero-shot capability in a minimal setup. Despite its small size (~630k parameters), it aligns images and captions effectively by learning shared visual-language embeddings.
+## What tinyvvision can do:
+- Match simple geometric shapes (circles, stars, hearts, triangles, etc.) and descriptive captions (e.g., "a red circle", "a yellow star").
+- Perform genuine zero-shot generalization, meaning it can correctly match captions to shapes and colors it has never explicitly encountered during training.
+## Model Details:
+- **Type**: Contrastive embedding (CLIP-style, zero-shot)
+- **Parameters**: ~630,000 (tiny!)
+- **Training data**: Fully synthetic—randomly generated shapes, letters, numbers, and symbols paired with descriptive text captions.
+- **Architecture**:
+  - **Image Encoder**: Simple CNN
+  - **Text Encoder**: Small embedding layer + bidirectional GRU
+- **Embedding Dim**: 128-dimensional shared embedding space
+## Examples of Zero-Shot Matching:
+- **Seen during training**: "a red circle" → correctly matches the drawn red circle.
+- **Never seen**: "a teal lightning bolt" → correctly matched a hand-drawn lightning bolt shape, despite never having seen one during training.
+## Limitations:
+- tinyvvision is designed as a demonstration of zero-shot embedding and generalization on synthetic data. It is not trained on real-world data or complex scenarios. While robust within its domain (simple geometric shapes and clear captions), results may vary significantly on more complicated or out-of-domain inputs.
+## How to Test tinyvvision:
+Check out the provided inference script to easily test your own shapes and captions. Feel free to challenge tinyvvision with new, unseen combinations to explore its generalization capability!
+✨ **Enjoy experimenting!** ✨