RamadhanZome commited on
Commit
7a73a2b
·
verified ·
1 Parent(s): ef0bc6f

Improve model card: add usage example, fix preprocessing details, expand limitations

Browse files

Add an extended description including:
- Working code example using the transformers pipeline
- Updated preprocessing details with specific values (224x224, ImageNet normalization)
- Expanded limitations section with concrete details on dataset bias, class imbalance, skin tone bias, and input requirements

Files changed (1) hide show
  1. README.md +36 -17
README.md CHANGED
@@ -1,24 +1,17 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
  # Vision Transformer (ViT) for Facial Expression Recognition Model Card
5
 
6
  ## Model Overview
7
-
8
  - **Model Name:** [trpakov/vit-face-expression](https://huggingface.co/trpakov/vit-face-expression)
9
-
10
  - **Task:** Facial Expression/Emotion Recognition
11
-
12
  - **Dataset:** [FER2013](https://www.kaggle.com/datasets/msambare/fer2013)
13
-
14
  - **Model Architecture:** [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)
15
-
16
  - **Finetuned from model:** [vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k)
17
 
18
  ## Model Description
19
-
20
  The vit-face-expression model is a Vision Transformer fine-tuned for the task of facial emotion recognition.
21
-
22
  It is trained on the FER2013 dataset, which consists of facial images categorized into seven different emotions:
23
  - Angry
24
  - Disgust
@@ -29,18 +22,44 @@ It is trained on the FER2013 dataset, which consists of facial images categorize
29
  - Neutral
30
 
31
  ## Data Preprocessing
32
-
33
  The input images are preprocessed before being fed into the model. The preprocessing steps include:
34
- - **Resizing:** Images are resized to the specified input size.
35
- - **Normalization:** Pixel values are normalized to a specific range.
36
  - **Data Augmentation:** Random transformations such as rotations, flips, and zooms are applied to augment the training dataset.
37
 
38
  ## Evaluation Metrics
39
-
40
  - **Validation set accuracy:** 0.7113
41
  - **Test set accuracy:** 0.7116
42
 
43
- ## Limitations
 
 
 
 
 
 
 
 
 
44
 
45
- - **Data Bias:** The model's performance may be influenced by biases present in the training data.
46
- - **Generalization:** The model's ability to generalize to unseen data is subject to the diversity of the training dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
  # Vision Transformer (ViT) for Facial Expression Recognition Model Card
5
 
6
  ## Model Overview
 
7
  - **Model Name:** [trpakov/vit-face-expression](https://huggingface.co/trpakov/vit-face-expression)
 
8
  - **Task:** Facial Expression/Emotion Recognition
 
9
  - **Dataset:** [FER2013](https://www.kaggle.com/datasets/msambare/fer2013)
 
10
  - **Model Architecture:** [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)
 
11
  - **Finetuned from model:** [vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k)
12
 
13
  ## Model Description
 
14
  The vit-face-expression model is a Vision Transformer fine-tuned for the task of facial emotion recognition.
 
15
  It is trained on the FER2013 dataset, which consists of facial images categorized into seven different emotions:
16
  - Angry
17
  - Disgust
 
22
  - Neutral
23
 
24
  ## Data Preprocessing
 
25
  The input images are preprocessed before being fed into the model. The preprocessing steps include:
26
+ - **Resizing:** Images are resized to 224x224 pixels before being fed into the model.
27
+ - **Normalization:** Pixel values are normalized using ImageNet mean and standard deviation.
28
  - **Data Augmentation:** Random transformations such as rotations, flips, and zooms are applied to augment the training dataset.
29
 
30
  ## Evaluation Metrics
 
31
  - **Validation set accuracy:** 0.7113
32
  - **Test set accuracy:** 0.7116
33
 
34
+ ## Usage
35
+ ```python
36
+ from transformers import pipeline
37
+ from PIL import Image
38
+
39
+ # Load the model
40
+ pipe = pipeline("image-classification", model="trpakov/vit-face-expression")
41
+
42
+ # Load an image (must contain a face)
43
+ image = Image.open("your_image.jpg").convert("RGB")
44
 
45
+ # Run inference
46
+ results = pipe(image)
47
+
48
+ # Output: list of dicts with 'label' and 'score'
49
+ # Example: [{'label': 'happy', 'score': 0.98}, {'label': 'neutral', 'score': 0.01}, ...]
50
+ print(results)
51
+ ```
52
+
53
+ ## Limitations
54
+ - **Dataset bias:** FER2013 is collected from Google Image Search and is known
55
+ to contain noisy and mislabelled samples, which affects model reliability.
56
+ - **Class imbalance:** The dataset is heavily skewed toward "happy" and "neutral",
57
+ making the model less reliable for underrepresented classes like "disgust" and "fear".
58
+ - **Skin tone bias:** The model may perform worse on darker skin tones due to
59
+ underrepresentation in the training data.
60
+ - **Input requirements:** The model expects a cropped, frontal face image.
61
+ Performance degrades significantly on profile faces, occluded faces, or
62
+ images where the face is not the primary subject.
63
+ - **Image size:** Input images are resized to 224x224 pixels internally.
64
+ - **Real-world generalization:** Lab-posed expressions in training data differ
65
+ from natural spontaneous expressions in the wild.