syntheticbot commited on
Commit
68980fd
·
verified ·
1 Parent(s): 8bec6f0

Upload 9 files

Browse files
README.md CHANGED
@@ -1,3 +1,4 @@
 
1
  ---
2
  license: apache-2.0
3
  language: en
@@ -5,10 +6,12 @@ library_name: transformers
5
  tags:
6
  - clip
7
  - image-classification
 
8
  - fairface
9
  - vision
 
10
  model-index:
11
- - name: gender-classification-clip
12
  results:
13
  - task:
14
  type: image-classification
@@ -21,20 +24,28 @@ model-index:
21
  - type: accuracy
22
  value: 0.9638
23
  name: Gender Accuracy
 
 
 
 
 
 
24
  ---
25
 
26
- # Fine-tuned CLIP Model for Gender Classification
27
 
28
- This repository contains the model **`gender-classification-clip`**, a fine-tuned version of the **[openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14)** model. It has been adapted for classifying perceived gender from facial images.
29
 
30
- The model was trained on the gender labels from the **[FairFace dataset](https://github.com/joojs/fairface)**, which is designed to be balanced across demographic categories. This model card provides a detailed look at its performance, limitations, and intended use to encourage responsible application.
31
 
32
  ## Model Description
33
 
34
  The base model, CLIP (Contrastive Language-Image Pre-Training), learns rich visual representations by matching images to their corresponding text descriptions. This fine-tuned version repurposes the powerful vision encoder from CLIP for a specific classification task.
35
 
36
- It takes an image as input and outputs a prediction for:
 
37
  * **Gender:** 2 categories (Male, Female)
 
38
 
39
  ## Intended Uses & Limitations
40
 
@@ -42,11 +53,11 @@ This model is intended primarily for research and analysis purposes.
42
 
43
  ### Intended Uses
44
  * **Research on model fairness and bias:** Analyzing the model's performance differences across demographic groups.
45
- * **Providing a public baseline:** Serving as a starting point for researchers aiming to improve performance on gender classification.
46
- * **Educational purposes:** Demonstrating a fine-tuning approach on a vision model.
47
 
48
  ### Out-of-Scope and Prohibited Uses
49
- This model makes predictions about a sensitive demographic attribute and carries significant risks if misused. The following uses are explicitly out-of-scope and strongly discouraged:
50
  * **Surveillance, monitoring, or tracking of individuals.**
51
  * **Automated decision-making that impacts an individual's rights or opportunities** (e.g., loan applications, hiring decisions, insurance eligibility).
52
  * **Inferring or assigning an individual's self-identity.** The model's predictions are based on learned visual patterns and do not reflect how a person identifies.
@@ -54,7 +65,7 @@ This model makes predictions about a sensitive demographic attribute and carries
54
 
55
  ## How to Get Started
56
 
57
- To use this model, you need to import its custom `GenderClipVisionModel` class, as it is not a standard `AutoModel`.
58
 
59
  ```python
60
  import torch
@@ -65,36 +76,47 @@ import torch.nn as nn
65
 
66
  # --- 0. Define the Custom Model Class ---
67
  # You must define the model architecture to load the weights into it.
68
- class GenderClipVisionModel(nn.Module):
69
  def __init__(self, num_labels):
70
- super(GenderClipVisionModel, self).__init__()
71
  # Load the vision part of a CLIP model
72
  self.vision_model = AutoModel.from_pretrained("openai/clip-vit-large-patch14").vision_model
73
 
74
  hidden_size = self.vision_model.config.hidden_size
75
- self.gender_head = nn.Linear(hidden_size, num_labels)
 
 
76
 
77
  def forward(self, pixel_values):
78
  outputs = self.vision_model(pixel_values=pixel_values)
79
  pooled_output = outputs.pooler_output
80
- return self.gender_head(pooled_output)
 
 
 
 
81
 
82
  # --- 1. Configuration ---
83
- MODEL_PATH = "syntheticbot/gender-classification-clip"
84
  DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
85
 
86
  # --- 2. Define Label Mappings (must match training) ---
 
87
  gender_labels = ['Female', 'Male']
88
- id2label = {i: label for i, label in enumerate(sorted(gender_labels))}
89
- NUM_LABELS = len(gender_labels)
 
 
 
 
 
 
 
90
 
91
  # --- 3. Load Model and Processor ---
92
  processor = CLIPImageProcessor.from_pretrained(MODEL_PATH)
93
- model = GenderClipVisionModel(num_labels=NUM_LABELS)
94
 
95
- # Note: You would typically load fine-tuned weights here.
96
- # For this example, we proceed with the class structure.
97
- # model.load_state_dict(torch.load("path_to_your_model_weights.bin"))
98
 
99
  model.to(DEVICE)
100
  model.eval()
@@ -111,22 +133,27 @@ def predict(image_path):
111
  with torch.no_grad():
112
  logits = model(pixel_values=inputs['pixel_values'])
113
 
114
- pred_id = torch.argmax(logits, dim=-1).item()
115
- pred_label = id2label[pred_id]
 
 
 
116
 
117
- print(f"Prediction for {image_path}:")
118
- print(f" - Gender: {pred_label}")
119
- return {"gender": pred_label}
 
120
 
121
  # --- 5. Run Prediction ---
122
- # predict('sample.jpg') # Replace with the path to your image
 
123
  ```
124
 
125
  ## Training Details
126
 
127
  * **Base Model:** [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14)
128
- * **Dataset:** [FairFace](https://github.com/joojs/fairface) (using only gender labels)
129
- * **Training Procedure:** The model was fine-tuned for 5 epochs. The vision encoder was mostly frozen, with only the final 3 transformer layers being unfrozen for training. A separate linear classification head was added for the gender task. The loss function was the Cross-Entropy Loss.
130
 
131
  ## Evaluation
132
 
@@ -134,6 +161,8 @@ The model was evaluated on the FairFace validation split, which contains 10,954
134
 
135
  ### Performance Metrics
136
 
 
 
137
  #### **Gender Classification (Overall Accuracy: 96.38%)**
138
  ```
139
  precision recall f1-score support
@@ -146,12 +175,48 @@ The model was evaluated on the FairFace validation split, which contains 10,954
146
  weighted avg 0.96 0.96 0.96 10954
147
  ```
148
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149
  ## Bias, Risks, and Limitations
150
 
151
- * **Perceptual vs. Identity:** The model predicts perceived gender based on visual data. These predictions are not a determination of an individual's true self-identity or gender expression.
152
- * **Performance Disparities:** The evaluation shows high overall accuracy, but performance may not be uniform across all intersectional demographic groups (e.g., different races, ages). Using this model in any application can perpetuate existing biases.
153
  * **Data Representation:** While trained on FairFace, a balanced dataset, the model may still reflect societal biases present in the original pre-training data of CLIP.
154
- * **Risk of Misclassification:** Any misclassification of a sensitive attribute can have negative social consequences. The model is not perfect and will make mistakes.
155
 
156
  ### Citation
157
 
@@ -174,4 +239,5 @@ weighted avg 0.96 0.96 0.96 10954
174
  pages={1548--1558},
175
  year={2021}
176
  }
 
177
  ```
 
1
+
2
  ---
3
  license: apache-2.0
4
  language: en
 
6
  tags:
7
  - clip
8
  - image-classification
9
+ - multi-task-classification
10
  - fairface
11
  - vision
12
+ - autoeval-has-no-ethical-license
13
  model-index:
14
+ - name: clip-face-attribute-classifier
15
  results:
16
  - task:
17
  type: image-classification
 
24
  - type: accuracy
25
  value: 0.9638
26
  name: Gender Accuracy
27
+ - type: accuracy
28
+ value: 0.7322
29
+ name: Race Accuracy
30
+ - type: accuracy
31
+ value: 0.5917
32
+ name: Age Accuracy
33
  ---
34
 
35
+ # Fine-tuned CLIP Model for Face Attribute Classification
36
 
37
+ This repository contains the model **`clip-face-attribute-classifier`**, a fine-tuned version of the **[openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14)** model. It has been adapted for multi-task classification of perceived age, gender, and race from facial images.
38
 
39
+ The model was trained on the **[FairFace dataset](https://github.com/joojs/fairface)**, which is designed to be balanced across these demographic categories. This model card provides a detailed look at its performance, limitations, and intended use to encourage responsible application.
40
 
41
  ## Model Description
42
 
43
  The base model, CLIP (Contrastive Language-Image Pre-Training), learns rich visual representations by matching images to their corresponding text descriptions. This fine-tuned version repurposes the powerful vision encoder from CLIP for a specific classification task.
44
 
45
+ It takes an image as input and outputs three separate predictions for:
46
+ * **Age:** 9 categories (0-2, 3-9, 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, more than 70)
47
  * **Gender:** 2 categories (Male, Female)
48
+ * **Race:** 7 categories (White, Black, Indian, East Asian, Southeast Asian, Middle Eastern, Latino_Hispanic)
49
 
50
  ## Intended Uses & Limitations
51
 
 
53
 
54
  ### Intended Uses
55
  * **Research on model fairness and bias:** Analyzing the model's performance differences across demographic groups.
56
+ * **Providing a public baseline:** Serving as a starting point for researchers aiming to improve performance on these specific classification tasks.
57
+ * **Educational purposes:** Demonstrating a multi-task fine-tuning approach on a vision model.
58
 
59
  ### Out-of-Scope and Prohibited Uses
60
+ This model makes predictions about sensitive demographic attributes and carries significant risks if misused. The following uses are explicitly out-of-scope and strongly discouraged:
61
  * **Surveillance, monitoring, or tracking of individuals.**
62
  * **Automated decision-making that impacts an individual's rights or opportunities** (e.g., loan applications, hiring decisions, insurance eligibility).
63
  * **Inferring or assigning an individual's self-identity.** The model's predictions are based on learned visual patterns and do not reflect how a person identifies.
 
65
 
66
  ## How to Get Started
67
 
68
+ To use this model, you need to import its custom `MultiTaskClipVisionModel` class, as it is not a standard `AutoModel`.
69
 
70
  ```python
71
  import torch
 
76
 
77
  # --- 0. Define the Custom Model Class ---
78
  # You must define the model architecture to load the weights into it.
79
+ class MultiTaskClipVisionModel(nn.Module):
80
  def __init__(self, num_labels):
81
+ super(MultiTaskClipVisionModel, self).__init__()
82
  # Load the vision part of a CLIP model
83
  self.vision_model = AutoModel.from_pretrained("openai/clip-vit-large-patch14").vision_model
84
 
85
  hidden_size = self.vision_model.config.hidden_size
86
+ self.age_head = nn.Linear(hidden_size, num_labels['age'])
87
+ self.gender_head = nn.Linear(hidden_size, num_labels['gender'])
88
+ self.race_head = nn.Linear(hidden_size, num_labels['race'])
89
 
90
  def forward(self, pixel_values):
91
  outputs = self.vision_model(pixel_values=pixel_values)
92
  pooled_output = outputs.pooler_output
93
+ return {
94
+ 'age': self.age_head(pooled_output),
95
+ 'gender': self.gender_head(pooled_output),
96
+ 'race': self.race_head(pooled_output),
97
+ }
98
 
99
  # --- 1. Configuration ---
100
+ MODEL_PATH = "syntheticbot/clip-face-attribute-classifier"
101
  DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
102
 
103
  # --- 2. Define Label Mappings (must match training) ---
104
+ age_labels = ['0-2', '10-19', '20-29', '3-9', '30-39', '40-49', '50-59', '60-69', 'more than 70']
105
  gender_labels = ['Female', 'Male']
106
+ race_labels = ['Black', 'East Asian', 'Indian', 'Latino_Hispanic', 'Middle Eastern', 'Southeast Asian', 'White']
107
+
108
+ # Use sorted lists to create a consistent mapping
109
+ id_mappings = {
110
+ 'age': {i: label for i, label in enumerate(sorted(age_labels))},
111
+ 'gender': {i: label for i, label in enumerate(sorted(gender_labels))},
112
+ 'race': {i: label for i, label in enumerate(sorted(race_labels))},
113
+ }
114
+ NUM_LABELS = { 'age': len(age_labels), 'gender': len(gender_labels), 'race': len(race_labels) }
115
 
116
  # --- 3. Load Model and Processor ---
117
  processor = CLIPImageProcessor.from_pretrained(MODEL_PATH)
118
+ model = MultiTaskClipVisionModel(num_labels=NUM_LABELS)
119
 
 
 
 
120
 
121
  model.to(DEVICE)
122
  model.eval()
 
133
  with torch.no_grad():
134
  logits = model(pixel_values=inputs['pixel_values'])
135
 
136
+ predictions = {}
137
+ for task in ['age', 'gender', 'race']:
138
+ pred_id = torch.argmax(logits[task], dim=-1).item()
139
+ pred_label = id_mappings[task][pred_id]
140
+ predictions[task] = pred_label
141
 
142
+ print(f"Predictions for {image_path}:")
143
+ for task, label in predictions.items():
144
+ print(f" - {task.capitalize()}: {label}")
145
+ return predictions
146
 
147
  # --- 5. Run Prediction ---
148
+
149
+ predict('sample.jpg') # Replace with the path to your image
150
  ```
151
 
152
  ## Training Details
153
 
154
  * **Base Model:** [openai/clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14)
155
+ * **Dataset:** [FairFace](https://github.com/joojs/fairface)
156
+ * **Training Procedure:** The model was fine-tuned for 5 epochs. The vision encoder was mostly frozen, with only the final 3 transformer layers being unfrozen for training. A separate linear classification head was added for each task (age, gender, race). The total loss was the sum of the Cross-Entropy Loss from each of the three tasks.
157
 
158
  ## Evaluation
159
 
 
161
 
162
  ### Performance Metrics
163
 
164
+ The following reports detail the model's performance on each task.
165
+
166
  #### **Gender Classification (Overall Accuracy: 96.38%)**
167
  ```
168
  precision recall f1-score support
 
175
  weighted avg 0.96 0.96 0.96 10954
176
  ```
177
 
178
+ #### **Race Classification (Overall Accuracy: 73.22%)**
179
+ ```
180
+ precision recall f1-score support
181
+
182
+ Black 0.90 0.89 0.89 1556
183
+ East Asian 0.74 0.78 0.76 1550
184
+ Indian 0.81 0.75 0.78 1516
185
+ Latino_Hispanic 0.58 0.62 0.60 1623
186
+ Middle Eastern 0.69 0.57 0.62 1209
187
+ Southeast Asian 0.66 0.65 0.65 1415
188
+ White 0.75 0.80 0.77 2085
189
+
190
+ accuracy 0.73 10954
191
+ macro avg 0.73 0.72 0.73 10954
192
+ weighted avg 0.73 0.73 0.73 10954
193
+ ```
194
+
195
+ #### **Age Classification (Overall Accuracy: 59.17%)**
196
+ ```
197
+ precision recall f1-score support
198
+
199
+ 0-2 0.93 0.45 0.60 199
200
+ 10-19 0.62 0.41 0.50 1181
201
+ 20-29 0.64 0.76 0.70 3300
202
+ 3-9 0.77 0.88 0.82 1356
203
+ 30-39 0.49 0.50 0.49 2330
204
+ 40-49 0.46 0.44 0.45 1353
205
+ 50-59 0.47 0.40 0.43 796
206
+ 60-69 0.45 0.32 0.38 321
207
+ more than 70 0.75 0.10 0.18 118
208
+
209
+ accuracy 0.59 10954
210
+ macro avg 0.62 0.47 0.51 10954
211
+ weighted avg 0.59 0.59 0.58 10954
212
+ ```
213
+
214
  ## Bias, Risks, and Limitations
215
 
216
+ * **Perceptual vs. Identity:** The model predicts perceived attributes based on visual data. These predictions are not a determination of an individual's true self-identity.
217
+ * **Performance Disparities:** The evaluation clearly shows that performance is not uniform across all categories. The model is significantly less accurate for certain racial groups (e.g., Latino_Hispanic, Middle Eastern) and older age groups. Using this model in any application will perpetuate these biases.
218
  * **Data Representation:** While trained on FairFace, a balanced dataset, the model may still reflect societal biases present in the original pre-training data of CLIP.
219
+ * **Risk of Misclassification:** Any misclassification, particularly of sensitive attributes, can have negative social consequences. The model's moderate accuracy in age and race prediction makes this a significant risk.
220
 
221
  ### Citation
222
 
 
239
  pages={1548--1558},
240
  year={2021}
241
  }
242
+ ```
243
  ```
config.json ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "clip-vit-large-patch14/",
3
+ "architectures": [
4
+ "CLIPModel"
5
+ ],
6
+ "initializer_factor": 1.0,
7
+ "logit_scale_init_value": 2.6592,
8
+ "model_type": "clip",
9
+ "projection_dim": 768,
10
+ "text_config": {
11
+ "_name_or_path": "",
12
+ "add_cross_attention": false,
13
+ "architectures": null,
14
+ "attention_dropout": 0.0,
15
+ "bad_words_ids": null,
16
+ "bos_token_id": 0,
17
+ "chunk_size_feed_forward": 0,
18
+ "cross_attention_hidden_size": null,
19
+ "decoder_start_token_id": null,
20
+ "diversity_penalty": 0.0,
21
+ "do_sample": false,
22
+ "dropout": 0.0,
23
+ "early_stopping": false,
24
+ "encoder_no_repeat_ngram_size": 0,
25
+ "eos_token_id": 2,
26
+ "finetuning_task": null,
27
+ "forced_bos_token_id": null,
28
+ "forced_eos_token_id": null,
29
+ "hidden_act": "quick_gelu",
30
+ "hidden_size": 768,
31
+ "id2label": {
32
+ "0": "LABEL_0",
33
+ "1": "LABEL_1"
34
+ },
35
+ "initializer_factor": 1.0,
36
+ "initializer_range": 0.02,
37
+ "intermediate_size": 3072,
38
+ "is_decoder": false,
39
+ "is_encoder_decoder": false,
40
+ "label2id": {
41
+ "LABEL_0": 0,
42
+ "LABEL_1": 1
43
+ },
44
+ "layer_norm_eps": 1e-05,
45
+ "length_penalty": 1.0,
46
+ "max_length": 20,
47
+ "max_position_embeddings": 77,
48
+ "min_length": 0,
49
+ "model_type": "clip_text_model",
50
+ "no_repeat_ngram_size": 0,
51
+ "num_attention_heads": 12,
52
+ "num_beam_groups": 1,
53
+ "num_beams": 1,
54
+ "num_hidden_layers": 12,
55
+ "num_return_sequences": 1,
56
+ "output_attentions": false,
57
+ "output_hidden_states": false,
58
+ "output_scores": false,
59
+ "pad_token_id": 1,
60
+ "prefix": null,
61
+ "problem_type": null,
62
+ "projection_dim" : 768,
63
+ "pruned_heads": {},
64
+ "remove_invalid_values": false,
65
+ "repetition_penalty": 1.0,
66
+ "return_dict": true,
67
+ "return_dict_in_generate": false,
68
+ "sep_token_id": null,
69
+ "task_specific_params": null,
70
+ "temperature": 1.0,
71
+ "tie_encoder_decoder": false,
72
+ "tie_word_embeddings": true,
73
+ "tokenizer_class": null,
74
+ "top_k": 50,
75
+ "top_p": 1.0,
76
+ "torch_dtype": null,
77
+ "torchscript": false,
78
+ "transformers_version": "4.16.0.dev0",
79
+ "use_bfloat16": false,
80
+ "vocab_size": 49408
81
+ },
82
+ "text_config_dict": {
83
+ "hidden_size": 768,
84
+ "intermediate_size": 3072,
85
+ "num_attention_heads": 12,
86
+ "num_hidden_layers": 12,
87
+ "projection_dim": 768
88
+ },
89
+ "torch_dtype": "float32",
90
+ "transformers_version": null,
91
+ "vision_config": {
92
+ "_name_or_path": "",
93
+ "add_cross_attention": false,
94
+ "architectures": null,
95
+ "attention_dropout": 0.0,
96
+ "bad_words_ids": null,
97
+ "bos_token_id": null,
98
+ "chunk_size_feed_forward": 0,
99
+ "cross_attention_hidden_size": null,
100
+ "decoder_start_token_id": null,
101
+ "diversity_penalty": 0.0,
102
+ "do_sample": false,
103
+ "dropout": 0.0,
104
+ "early_stopping": false,
105
+ "encoder_no_repeat_ngram_size": 0,
106
+ "eos_token_id": null,
107
+ "finetuning_task": null,
108
+ "forced_bos_token_id": null,
109
+ "forced_eos_token_id": null,
110
+ "hidden_act": "quick_gelu",
111
+ "hidden_size": 1024,
112
+ "id2label": {
113
+ "0": "LABEL_0",
114
+ "1": "LABEL_1"
115
+ },
116
+ "image_size": 224,
117
+ "initializer_factor": 1.0,
118
+ "initializer_range": 0.02,
119
+ "intermediate_size": 4096,
120
+ "is_decoder": false,
121
+ "is_encoder_decoder": false,
122
+ "label2id": {
123
+ "LABEL_0": 0,
124
+ "LABEL_1": 1
125
+ },
126
+ "layer_norm_eps": 1e-05,
127
+ "length_penalty": 1.0,
128
+ "max_length": 20,
129
+ "min_length": 0,
130
+ "model_type": "clip_vision_model",
131
+ "no_repeat_ngram_size": 0,
132
+ "num_attention_heads": 16,
133
+ "num_beam_groups": 1,
134
+ "num_beams": 1,
135
+ "num_hidden_layers": 24,
136
+ "num_return_sequences": 1,
137
+ "output_attentions": false,
138
+ "output_hidden_states": false,
139
+ "output_scores": false,
140
+ "pad_token_id": null,
141
+ "patch_size": 14,
142
+ "prefix": null,
143
+ "problem_type": null,
144
+ "projection_dim" : 768,
145
+ "pruned_heads": {},
146
+ "remove_invalid_values": false,
147
+ "repetition_penalty": 1.0,
148
+ "return_dict": true,
149
+ "return_dict_in_generate": false,
150
+ "sep_token_id": null,
151
+ "task_specific_params": null,
152
+ "temperature": 1.0,
153
+ "tie_encoder_decoder": false,
154
+ "tie_word_embeddings": true,
155
+ "tokenizer_class": null,
156
+ "top_k": 50,
157
+ "top_p": 1.0,
158
+ "torch_dtype": null,
159
+ "torchscript": false,
160
+ "transformers_version": "4.16.0.dev0",
161
+ "use_bfloat16": false
162
+ },
163
+ "vision_config_dict": {
164
+ "hidden_size": 1024,
165
+ "intermediate_size": 4096,
166
+ "num_attention_heads": 16,
167
+ "num_hidden_layers": 24,
168
+ "patch_size": 14,
169
+ "projection_dim": 768
170
+ }
171
+ }
gitattributes (1) ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e811883e6f247acc61a869a938b9523d1eb1d34fa3c1e882b3f033a49b8cb72d
3
+ size 1212846240
preprocessor_config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": true,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "image_mean": [
12
+ 0.48145466,
13
+ 0.4578275,
14
+ 0.40821073
15
+ ],
16
+ "image_processor_type": "CLIPImageProcessor",
17
+ "image_std": [
18
+ 0.26862954,
19
+ 0.26130258,
20
+ 0.27577711
21
+ ],
22
+ "resample": 3,
23
+ "rescale_factor": 0.00392156862745098,
24
+ "size": {
25
+ "shortest_edge": 224
26
+ }
27
+ }
requirements.txt ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This file lists the required packages for the clip-face-attribute-classifier project.
2
+ # Install them using: pip install -r requirements.txt
3
+
4
+ # --- Hugging Face Libraries ---
5
+ # Core library for models, Trainer, TrainingArguments, and processors
6
+ transformers==4.38.2
7
+ # Used for data handling and creating Dataset objects
8
+ datasets==2.18.0
9
+ # For efficient training and hardware acceleration with the Trainer
10
+ accelerate==0.27.2
11
+ # For interacting with the Hugging Face Hub (login, upload, etc.)
12
+ huggingface_hub==0.21.4
13
+
14
+
15
+ # --- Core Deep Learning Framework ---
16
+ # The fundamental deep learning library
17
+ torch==2.2.1
18
+ # Companion library for computer vision tasks in PyTorch
19
+ torchvision==0.17.1
20
+
21
+
22
+ # --- Data Handling and Metrics ---
23
+ # For reading and manipulating the .csv label files
24
+ pandas==2.2.1
25
+ # For calculating evaluation metrics like accuracy, precision, recall, and F1-score
26
+ scikit-learn==1.4.1.post1
27
+
28
+
29
+ # --- Utilities ---
30
+ # For opening and handling image files
31
+ Pillow==10.2.0
32
+ # For creating progress bars during evaluation
33
+ tqdm==4.66.2
34
+ # For loading the safer .safetensors model format
35
+ safetensors==0.4.2
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "unk_token": {
3
+ "content": "<|endoftext|>",
4
+ "single_word": false,
5
+ "lstrip": false,
6
+ "rstrip": false,
7
+ "normalized": true,
8
+ "__type": "AddedToken"
9
+ },
10
+ "bos_token": {
11
+ "content": "<|startoftext|>",
12
+ "single_word": false,
13
+ "lstrip": false,
14
+ "rstrip": false,
15
+ "normalized": true,
16
+ "__type": "AddedToken"
17
+ },
18
+ "eos_token": {
19
+ "content": "<|endoftext|>",
20
+ "single_word": false,
21
+ "lstrip": false,
22
+ "rstrip": false,
23
+ "normalized": true,
24
+ "__type": "AddedToken"
25
+ },
26
+ "pad_token": "<|endoftext|>",
27
+ "add_prefix_space": false,
28
+ "errors": "replace",
29
+ "do_lower_case": true,
30
+ "name_or_path": "openai/clip-vit-base-patch32",
31
+ "model_max_length": 77,
32
+ "special_tokens_map_file": "./special_tokens_map.json",
33
+ "tokenizer_class": "CLIPTokenizer"
34
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff