davidberenstein1957
/

tiny-random-vqa-score

vision-language

text-to-image-evaluation

Model card Files Files and versions

tiny-random-vqa-score / README.md

davidberenstein1957's picture

davidberenstein1957

Upload folder using huggingface_hub

bdec2d8 verified 5 months ago

|

history blame contribute delete

1.01 kB

	---
	language:
	- en
	tags:
	- vision-language
	- vqa
	- text-to-image-evaluation
	license: mit
	---

	# Tiny Random VQAScore Model

	This is a tiny random version of the VQAScore architecture for educational and testing purposes.

	## Model Architecture

	- Vision Encoder: Tiny CNN + Transformer (64 hidden size)
	- Language Model: Tiny Transformer (256 hidden size)
	- Multimodal Projector: MLP with 256 → 128 → 64 → 1

	## Usage

	```python
	from create_tiny_vqa_model import TinyVQAScore

	# Load the model
	model = TinyVQAScore(device="cpu")

	# Score an image
	from PIL import Image
	image = Image.open("your_image.jpg")
	score = model.score(image, "What is shown in this image?")
	print(f"VQA Score: {score}")
	```

	## Model Size

	- Parameters: ~50K (vs ~11B for the original XXL model)
	- Memory: ~200KB (vs ~22GB for the original XXL model)

	## Disclaimer

	This is a randomly initialized model for testing and educational purposes. It is not trained and will not produce meaningful VQA results.