Classifying Elephants

Community Article Published December 27, 2025

For my lectures in Computer Vision I have used the Elephant dataset. Here we use four AI models, with very different AI architectures, to classify Elephants.

My students often fail to understand that AI models are very bad (compared to humans) to generalize. 'Generalization' is something where humans excel: show an image of a real elephant to a 3 year old child and it will recognize a drawing of an elephant instantly.

The traditional model architectures, like ResNet or a Vision Transformer, trained on ImageNet, are quite bad in recognizing drawings of elephants.

The more modern models, trained on very large web-scale datasets, are much better in recognizing drawings of elephants.

Models

The following models were used:

ResNet
Vision Transformer
CLIP
Florence2

Please see the results and a comparision of the models below.

Model	Classified as elephant	Dataset/size	Model Size	Remarks
ResNet (2015)	5 / 15	ImageNet 1.4 M images	?
ViT (2020)	5 / 15	ImageNet 1.4 M images	346Mb
CLIP (2022)	8 / 15	400 M images	?	Dataset not published
Florence2 (2024)	13 / 15	129 M images	1.5 Gb	Highly curated dataset ±5B annotations

It needs further analysis and critical evaluation why the newer models CLIP and Florence2 are better in generalisation than the older models.

Links

Colabs: https://drive.google.com/drive/folders/1rKMTRmqcLBpwHoXoTAfq0bjF7tR9QSrV

Dataset: https://huggingface.co/datasets/MichielBontenbal/elephants

Datasets mentioned in this article 1

How to choose your AI model

March 10, 2025

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote