File size: 8,000 Bytes
4ec7727 8bec6f0 34248ca 8bec6f0 914376c 8bec6f0 34248ca 8bec6f0 34248ca 8bec6f0 34248ca 8bec6f0 34248ca 8bec6f0 34248ca 8bec6f0 34248ca 8bec6f0 d424775 4ec7727 8bec6f0 4ec7727 8bec6f0 4ec7727 8bec6f0 4ec7727 8bec6f0 4ec7727 34248ca 8bec6f0 34248ca 8bec6f0 34248ca 8bec6f0 34248ca 8bec6f0 4ec7727 8bec6f0 4ec7727 8bec6f0 34248ca 8bec6f0 4ec7727 34248ca 8bec6f0 4ec7727 8bec6f0 4ec7727 8bec6f0 4ec7727 8bec6f0 4ec7727 8bec6f0 4ec7727 8bec6f0 4ec7727 8bec6f0 d424775 4ec7727 8bec6f0 34248ca 8bec6f0 34248ca 8bec6f0 34248ca 8bec6f0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
---
license: apache-2.0
language: en
library_name: transformers
tags:
- clip
- image-classification
- fairface
- vision
model-index:
- name: gender-classification-clip
results:
- task:
type: image-classification
name: image-classification
dataset:
name: FairFace
type: joojs/fairface
split: validation
metrics:
- type: accuracy
value: 0.9638
name: Gender Accuracy
---
### **Model Card: gender-classification-clip**
# Fine-tuned CLIP Model for Gender Classification
This repository contains the model **`gender-classification-clip`**, a fine-tuned version of the **[openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14)** model. It has been adapted for classifying perceived gender from facial images.
The model was trained on the gender labels from the **[FairFace dataset](https://github.com/joojs/fairface)**, which is designed to be balanced across demographic categories. This model card provides a detailed look at its performance, limitations, and intended use to encourage responsible application.
## Model Description
The base model, CLIP (Contrastive Language-Image Pre-Training), learns rich visual representations by matching images to their corresponding text descriptions. This fine-tuned version repurposes the powerful vision encoder from CLIP for a specific classification task.
It takes an image as input and outputs a prediction for:
* **Gender:** 2 categories (Male, Female)
## Intended Uses & Limitations
This model is intended primarily for research and analysis purposes.
### Intended Uses
* **Research on model fairness and bias:** Analyzing the model's performance differences across demographic groups.
* **Providing a public baseline:** Serving as a starting point for researchers aiming to improve performance on gender classification.
* **Educational purposes:** Demonstrating a fine-tuning approach on a vision model.
### Out-of-Scope and Prohibited Uses
This model makes predictions about a sensitive demographic attribute and carries significant risks if misused. The following uses are explicitly out-of-scope and strongly discouraged:
* **Surveillance, monitoring, or tracking of individuals.**
* **Automated decision-making that impacts an individual's rights or opportunities** (e.g., loan applications, hiring decisions, insurance eligibility).
* **Inferring or assigning an individual's self-identity.** The model's predictions are based on learned visual patterns and do not reflect how a person identifies.
* **Creating or reinforcing harmful social stereotypes.**
## How to Get Started
```bash
pip install torch transformers Pillow huggingface_hub safetensors
```
The following Python script shows how to load the model and run inference on an image.
```python
import torch
import torch.nn as nn
from transformers import CLIPImageProcessor, AutoModel
from PIL import Image
import os
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from requests.exceptions import HTTPError
# --- 0. Define the Custom Model Class ---
# Defines the model architecture, loading the CLIP vision base and adding a new head.
class GenderClipVisionModel(nn.Module):
def __init__(self, num_labels):
super(GenderClipVisionModel, self).__init__()
self.vision_model = AutoModel.from_pretrained("openai/clip-vit-large-patch14").vision_model
hidden_size = self.vision_model.config.hidden_size
self.gender_head = nn.Linear(hidden_size, num_labels)
def forward(self, pixel_values):
outputs = self.vision_model(pixel_values=pixel_values)
pooled_output = outputs.pooler_output
return self.gender_head(pooled_output)
# --- 1. Configuration ---
MODEL_REPO = "syntheticbot/gender-classification-clip"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
# --- 2. Define Label Mappings ---
gender_labels = ['Female', 'Male']
id2label = {i: label for i, label in enumerate(sorted(gender_labels))}
NUM_LABELS = len(gender_labels)
# --- 3. Load Model and Processor ---
# Processor to prepare images for the model.
processor = CLIPImageProcessor.from_pretrained(MODEL_REPO)
# Initialize the custom model structure.
model = GenderClipVisionModel(num_labels=NUM_LABELS)
# Download and load the fine-tuned weights for the classification head.
try:
weights_path = hf_hub_download(repo_id=MODEL_REPO, filename="model.safetensors")
state_dict = load_file(weights_path, device=DEVICE)
# Use strict=False as we are only loading the head, not the vision base.
model.load_state_dict(state_dict, strict=False)
print("Fine-tuned weights loaded successfully.")
except Exception as e:
print(f"Error loading weights: {e}")
model.to(DEVICE)
model.eval() # Set to evaluation mode
# --- 4. Prediction Function ---
def predict(image_path):
if not os.path.exists(image_path):
print(f"Error: Image not found at {image_path}")
return
try:
image = Image.open(image_path).convert("RGB")
inputs = processor(images=image, return_tensors="pt").to(DEVICE)
with torch.no_grad():
logits = model(pixel_values=inputs['pixel_values'])
pred_id = torch.argmax(logits, dim=-1).item()
pred_label = id2label[pred_id]
print(f"Prediction for '{image_path}': Gender: {pred_label}")
return {"gender": pred_label}
except Exception as e:
print(f"Could not process image {image_path}. Error: {e}")
return None
# --- 5. Run Prediction ---
predict('path/to/your/image.jpg') # <-- Replace with the path to your image
```
## Training Details
* **Base Model:** [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14)
* **Dataset:** [FairFace](https://github.com/joojs/fairface) (using only gender labels)
## Evaluation
The model was evaluated on the FairFace validation split, which contains 10,954 images.
### Performance Metrics
#### **Gender Classification (Overall Accuracy: 96.38%)**
```
precision recall f1-score support
Female 0.96 0.96 0.96 5162
Male 0.96 0.97 0.97 5792
accuracy 0.96 10954
macro avg 0.96 0.96 0.96 10954
weighted avg 0.96 0.96 0.96 10954
```
## Bias, Risks, and Limitations
* **Perceptual vs. Identity:** The model predicts perceived gender based on visual data. These predictions are not a determination of an individual's true self-identity or gender expression.
* **Performance Disparities:** The evaluation shows high overall accuracy, but performance may not be uniform across all intersectional demographic groups (e.g., different races, ages). Using this model in any application can perpetuate existing biases.
* **Data Representation:** While trained on FairFace, a balanced dataset, the model may still reflect societal biases present in the original pre-training data of CLIP.
* **Risk of Misclassification:** Any misclassification of a sensitive attribute can have negative social consequences. The model is not perfect and will make mistakes.
### Citation
**Original CLIP Model:**
```bibtex
@inproceedings{radford2021learning,
title={Learning Transferable Visual Models From Natural Language Supervision},
author={Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
booktitle={International Conference on Machine Learning},
year={2021}
}
```
**FairFace Dataset:**
```bibtex
@inproceedings{karkkainenfairface,
title={FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age},
author={Karkkainen, Kimmo and Joo, Jungseock},
booktitle={IEEE Winter Conference on Applications of Computer Vision (WACV)},
pages={1548--1558},
year={2021}
}
``` |