MuhammedKsee's picture
Update README.md
c0be859 verified
metadata
license: cc-by-4.0
tags:
  - vision
  - image-text-retrieval
  - clip
  - pytorch
  - vision-transformer
library_name: pytorch
pipeline_tag: zero-shot-image-classification
language:
  - en

Custom CLIP (ViT-B/16) - Optimized

This model is a scratch-built, highly optimized implementation of the CLIP architecture, developed as part of an Academic Research Project.

It achieves 2.46x faster inference speed (Latency: 21ms vs 52ms) compared to the standard OpenAI CLIP model on consumer hardware (RTX 3050 Ti), while maintaining 97.7% Zero-Shot Accuracy.

🔗 Source Code & Usage

The full source code, training details, and inference scripts are available on GitHub: 👉 GitHub Repository: custom-clip-vit-b-coco

(Please verify the GitHub link matches your actual repo URL)

🚀 Performance Benchmark

Model Optimization Latency Speedup Accuracy
OpenAI CLIP FP32 52.22 ms 1.0x 99.88%
Custom CLIP FP16 + Compile 21.20 ms 2.46x 97.71%

⚠️ License & Citation

This model is licensed under CC-BY 4.0. You are free to use it for academic or commercial purposes, but you must provide attribution to the author:

Author: Muhammed Köse
Project: Custom CLIP Optimization