| tags: | |
| - clip | |
| - vision-language | |
| - image-text | |
| - pytorch | |
| license: apache-2.0 | |
| # CLIP Model | |
| This is a fine-tuned CLIP model for vision-language tasks. | |
| ## Model Description | |
| This model was fine-tuned from a base CLIP model and includes custom temperature scaling. | |
| ## Usage | |
| ```python | |
| from transformers import CLIPModel, CLIPProcessor | |
| import torch | |
| # Load model and processor | |
| model = CLIPModel.from_pretrained("aprendesc/CLIP_model_v0") | |
| processor = CLIPProcessor.from_pretrained("aprendesc/CLIP_model_v0") | |
| # Load temperature parameter if available | |
| try: | |
| from huggingface_hub import hf_hub_download | |
| temperature_path = hf_hub_download(repo_id="aprendesc/CLIP_model_v0", filename="temperature.pth") | |
| temperature = torch.load(temperature_path, map_location='cpu') | |
| print(f"Temperature parameter: {temperature}") | |
| except: | |
| print("No temperature parameter found") | |
| # Use the model for inference | |
| # ... your inference code here ... | |
| ``` | |
| ## Training Details | |
| - Base model: CLIP | |
| - Custom temperature scaling included | |
| - Fine-tuned for specific vision-language tasks | |