ResNetFer_2013 / README.md

Update README.md

8b9f8dd verified 2 months ago

3.7 kB

	---
	license: apache-2.0
	datasets:
	- clip-benchmark/wds_fer2013
	base_model:
	- microsoft/resnet-50
	pipeline_tag: image-classification
	library_name: keras
	---
	## Model Description

	This model is a ResNet-50 deep convolutional neural network fine-tuned for the FER-2013 (Facial Expression Recognition 2013) dataset. The dataset consists of low-resolution (48x48) grayscale images of faces categorized into seven core emotional states.

	This project focused on maximizing the performance of the pre-trained ResNet-50 architecture on this particularly challenging, noisy, and imbalanced dataset.

	## Training Details

	### Architecture

	* Base Model: ResNet-50 (pre-trained on ImageNet).
	* Head: Custom dense layers (224 units) with a high 0.5 dropout rate.
	* Transfer Learning Strategy: Deep Freezing. The model base was frozen up to the `conv5` block, meaning only the final convolutional block (`conv5`) and the custom head were fine-tuned. This prevents early layers, which are optimized for high-resolution images, from being corrupted by the 48x48 input.

	### Optimization & Regularization

	\| Technique \| Rationale \|
	\| :--- \| :--- \|
	\| Class Weighting \| Applied inverse frequency weights to mitigate the severe class imbalance (e.g., Disgust is rare, Happy is abundant). \|
	\| Data Augmentation \| Used random flips, translations, rotations, and zooms to artificially expand the small dataset and combat overfitting. \|
	\| High Dropout \| Increased dropout to 0.5 to aggressively regularize the model and prevent the divergence seen in earlier training runs. \|
	\| Optimizer \| Adam with a very low fine-tuning learning rate of 5e-6. \|

	## Evaluation Results

	The final model achieved its highest stability and best performance after 50 epochs of fine-tuning, demonstrating strong generalization given the difficulty of the data.

	### Overall Performance

	\| Metric \| Result \|
	\| :--- \| :--- \|
	\| Test Accuracy \| 45.70\% \|
	\| Test Loss \| 1.4929 \|
	\| Training Accuracy (End) \| 63.25\% \|

	### Per-Class F1-Scores

	The F1-Score highlights the model's difficulty with ambiguous negative emotions.

	\| Emotion \| F1-Score \| Support (Test Count) \| Notes \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| Neutral \| 0.6386 \| 831 \| Highest precision, well-distinguished class. \|
	\| Happy \| 0.6037 \| 1774 \| Strongest recall, the most abundant class. \|
	\| Disgust \| 0.4659 \| 111 \| Significantly improved performance on this rare class. \|
	\| Sad \| $0.3995$ \| 1233 \| Ambiguous. \|
	\| Surprise \| 0.3531 \| 1247 \| Ambiguous. \|
	\| Fear \| 0.3374 \| 1024 \| Ambiguous. \|
	\| Angry \| 0.3312 \| 958 \| Lowest F1-score, indicating high confusion. \|

	## 💡 Usage and Limitations

	### Inputs

	* Image Format: Grayscale (48x48 pixels).
	* Normalization: Pixel values must be scaled to [0, 1] (by dividing by 255.0).

	### Recommended Libraries

	* `tensorflow` (for loading the model)
	* `numpy` (for array manipulation)

	### Limitations

	1. Low Accuracy: The 45.70\% accuracy is limited by the low resolution (48x48) and noisy labels of the FER-2013 dataset. It is not comparable to modern human performance (65\%-68\% on FER-2013) or models trained on high-quality, high-resolution "in-the-wild" datasets like AffectNet.
	2. Overfitting: Despite aggressive regularization, the model remains highly overfit (Training vs. Test gap), which is characteristic of this dataset.

	### ❓ Troubleshooting the Error

	If you encounter `ValueError` upon loading, ensure you are loading the model with the `.keras` extension:

	```python
	import tensorflow as tf
	loaded_model = tf.keras.models.load_model("./best_fer_resnet_local/best_model.keras")
	```