fairface_age_image_detection

dima806/fairface_age_image_detection is a Vision Transformer (ViT-base) fine-tuned by Dmytro Iakubovskyi for face-based age group classification. Given a single face image, the model outputs a probability distribution over 9 discrete age brackets: 0–2, 3–9, 10–19, 20–29, 30–39, 40–49, 50–59, 60–69, and 70+.

Fine-tuning was performed on the FairFace dataset β€” 108,501 face images balanced across 7 racial groups sourced from the Yahoo Flickr Creative Commons 100M (YFCC-100M) collection. The model achieves an overall accuracy of 59% on the FairFace test split, with substantial variation across age groups (see Quantitative Analysis).

Property Details
Architecture ViT-base (Vision Transformer)
Task 9-class image classification
Training data FairFace (108,501 images)
Overall accuracy 59%
License Apache 2.0
Monthly downloads ~51 million

2. Intended Uses

  • Demographic research. Estimating broad age-group distributions across large image datasets for academic research into face-attribute classifiers, particularly when paired with balanced demographic evaluation.
  • Dataset curation and filtering. Automatically flagging images whose apparent age group falls outside a target range during large-scale dataset assembly. Results should always be manually reviewed before acting on them.
  • Non-consequential demo applications. Informal or entertainment-facing apps where incorrect predictions carry no legal, financial, or access-control consequences, and where the probabilistic nature of outputs is clearly communicated to users.
  • Fairness research baseline. As a reference model in studies evaluating subgroup-level performance disparities in age estimation, where its known failure modes can be characterised and improved upon.

3. Out-of-Scope Uses

  • Age-gating or access control. This model must not be used to determine whether a user meets a legal age threshold. With ~18% recall for adults aged 70+, the model fails elderly users at an unacceptable rate and is not a compliant substitute for certified age verification.
  • Surveillance or law enforcement. The model should not be deployed in real-time surveillance systems or any context where age estimates are used to make decisions about individuals without their knowledge or consent.
  • Healthcare or clinical assessment. The model was trained on consumer web photography, not clinical imaging, and has not been validated in any medical context.
  • Employment screening. Any use in hiring, promotion, or workforce management where an age estimate could introduce or amplify age-based discrimination would likely violate anti-discrimination law.
  • Applications primarily serving elderly users. Given ~18% recall for the 70+ group and ~47% for 60–69, applications serving older populations should not rely on this model without substantial retraining and re-validation on representative data.

4. Training Data

The model was fine-tuned on the FairFace dataset (KΓ€rkkΓ€inen & Joo, WACV 2021).

Property Details
Total images 108,501
Source YFCC-100M (Yahoo Flickr Creative Commons)
Racial groups 7 β€” White, Black, Latino/Hispanic, East Asian, Southeast Asian, Indian, Middle Eastern
Gender balance Approximately equal
Age labels 9 brackets: 0–2, 3–9, 10–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70+
License CC BY 4.0

FairFace was constructed to address the racial imbalance present in prior face datasets such as CelebA and LFWA. Images are predominantly well-lit, front-facing consumer portraits.

Coverage limitations:

  • Sourced from Flickr β€” reflects demographics of users with internet access, not the global population
  • Not representative of: clinical photography, surveillance footage, low-light environments, non-frontal poses, or heavily occluded faces
  • The 70+ cohort is a small minority in internet photo collections, which contributes to the model's reduced performance on this group

5. Evaluation Data

Performance metrics are reported on the held-out test split of FairFace, drawn from the same YFCC-100M Flickr source as the training set.

The test split is in-distribution with respect to training data β€” images come from the same platform and share similar photographic conditions. Performance on out-of-distribution sources (security cameras, medical imaging, mobile selfies) has not been characterised. Developers should construct and evaluate on a domain-specific test set before deploying in any context that differs meaningfully from consumer Flickr photography.

No cross-dataset evaluation on independent benchmarks such as MORPH, IMDB-WIKI, or UTKFace has been published for this model.


6. Metrics

Performance is reported using per-class recall (true positive rate per age bracket) and overall accuracy on the FairFace test split.

Recall is the primary metric of interest because it directly measures whether the model correctly identifies members of each age group. Overall accuracy alone is an insufficient characterisation of this model: performance varies dramatically across age brackets, and the headline figure of 59% should not be used as the sole criterion for deployment decisions.

Confidence calibration has not been evaluated. Whether the model's output probabilities are well-calibrated β€” i.e., whether a predicted confidence of 80% corresponds to ~80% empirical accuracy β€” is unknown and should be assessed before any application surfaces raw confidence scores to users.


7. Quantitative Analysis

Per-age-bracket recall derived from the confusion matrix evaluated on the FairFace test split:

Age Group Approx. Recall Notes
3–9 ~80% Strong performance
0–2 ~75% Good β€” infant features are visually distinct
20–29 ~72% Reasonable β€” well-represented cohort in training data
10–19 ~55% Moderate β€” frequent confusion with adjacent brackets
30–39 ~50% Moderate β€” high confusion with 20–29 and 40–49
50–59 ~49% Below average
60–69 ~47% Poor
40–49 ~44% Significant drop
70+ ~18% ⚠️ Near-total failure β€” performance is at near-chance level for this group

The 59% overall accuracy is a weighted average across all groups. Performance for adults over 60 is substantially below this figure. Any deployment that may affect older adults must account for these per-group recall values explicitly rather than relying on the aggregate metric.

Subgroup analysis at the intersection of age and race or gender has not been published. Intersection-level performance (e.g., recall for elderly women vs. elderly men across racial groups) is unknown and represents an important gap for equity-sensitive deployments.


8. Ethical Considerations

Uncertainty communicated as certainty. The model outputs a probability distribution, but applications commonly surface only a top-1 prediction without uncertainty context. At 59% accuracy, an apparently confident result is incorrect nearly half the time. Outputs must be presented to users with appropriate uncertainty framing.

Differential performance by age. The model performs substantially worse on older adults. Deploying without disclosing or mitigating this disparity risks systematic exclusion of elderly users from services they are entitled to access.

Misuse for age-gating. The model's accessibility makes it easy to integrate as an informal age check. This use is unsafe: the error rates are too high for any legally or ethically consequential access decision, and there is no built-in mechanism to flag or communicate low-confidence predictions.

Biometric data processing obligations. Face images are biometric personal data under GDPR (EU), CCPA (California), BIPA (Illinois), and equivalent frameworks. Any application collecting or processing face images must establish a lawful basis for processing, implement appropriate consent mechanisms, and comply with applicable data minimisation and retention requirements.


9. Limitations

Technical:

  • Overall accuracy of 59% on in-distribution (Flickr) data; lower accuracy expected on out-of-distribution sources
  • ~18% recall for the 70+ age group β€” effectively non-functional for this cohort without retraining
  • No calibration analysis published β€” output probability scores should not be interpreted without further validation
  • Performance degrades on non-frontal poses, low-light images, occluded faces, and non-consumer photographic contexts
  • Adjacent brackets (e.g., 20–29 and 30–39) are frequently confused

Contextual:

  • Validated only on Flickr-sourced imagery; generalisation to other domains is unverified
  • Not suitable for high-stakes or legally consequential decisions without domain-specific validation
  • The model does not detect when an input image is not a face, is of insufficient quality, or falls outside its reliable operating range

10. Recommendations

  1. Run a disaggregated evaluation on your own data before deploying. Construct a test set representative of your deployment population and measure recall per age bracket and demographic subgroup. Set and enforce minimum acceptable performance thresholds per subgroup before approving deployment.

  2. Do not use this model for age-gating or access control. For any application where age verification has legal or safety consequences, use a certified identity verification service. Facial age estimation is not a compliant verification method under any major regulatory framework.

  3. Communicate uncertainty to users. If model outputs are surfaced to end users, display a range of likely age brackets and a confidence indicator rather than a single predicted value. Include a statement that the result is a probabilistic estimate. Under the EU AI Act (Article 52), transparency disclosures are legally required for certain AI systems that interact with individuals.

  4. Do not deploy for elderly populations without retraining. If your application may serve users aged 60 or older, fine-tune on a dataset with adequate elderly representation and validate per-group performance before deployment. The current ~18% recall for 70+ is not an acceptable threshold for any live application.

  5. Complete a biometric compliance review. Before processing face images in production, confirm your legal basis for processing under applicable law, implement consent and deletion workflows, and prepare a Data Protection Impact Assessment (DPIA) where required.


Citation

@inproceedings{karkkainen2021fairface,
  title     = {FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation},
  author    = {K{\"a}rkkΓ€inen, Kimmo and Joo, Jungseock},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year      = {2021}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train evaks/fairface-age-detection-model-card

Evaluation results