File size: 7,828 Bytes

---
library_name: pytorch
pipeline_tag: image-classification
tags:
- image-classification
- computer-vision
- vehicle-classification
- fine-grained-classification
- pytorch
- timm
license: mit
---

# TwinCar Classifier

TwinCar is a vehicle make, model, and auxiliary year recognition project developed for the Brainster Data Science Academy Machine Learning Final Project.

The final deployed model is an EfficientNet-B3 classifier fine-tuned on Stanford Cars. It predicts one of 196 fine-grained Stanford Cars classes, then derives vehicle make, model, and year from the predicted class metadata.

## Final deployed checkpoint

- **Checkpoint file:** `efficientnet_b3_stanford300_augv2_best.pt`
- **Architecture:** EfficientNet-B3
- **Input size:** 300 px
- **Classes:** 196 Stanford Cars fine-grained classes
- **Training data:** Stanford Cars training split
- **Training augmentation:** augmentation v2
- **Framework:** PyTorch + timm
- **Checkpoint manifest:** `checkpoint_manifest.json`
- **Current status:** final deployed candidate

The model repo also keeps older checkpoints for comparison and rollback:

- `efficientnet_b3_stanford300_best.pt` — previous EfficientNet-B3 checkpoint
- `best.pt` — older ResNet18 baseline checkpoint

## What the model predicts

The model directly predicts a fine-grained class, for example:

```text
Dodge Charger SRT-8 2009
```

From that fine-grained prediction, the system derives:

- **make** — e.g. `Dodge`
- **model** — e.g. `Charger SRT-8`
- **year** — e.g. `2009`

Year is included as an auxiliary output, but it is not predicted by a separate year-regression or year-classification head. It is derived from the predicted fine-grained class metadata.

## Validation results

Final quantitative comparison is reported on the locked Stanford validation split using the same evaluation protocol for all compared models.

| Transform        | Fine acc | Make acc | Model acc | Year acc | Top-3 acc | Top-5 acc |
| ---------------- | -------: | -------: | --------: | -------: | --------: | --------: |
| clean            |   0.7864 |   0.8692 |    0.7925 |   0.8913 |    0.9196 |    0.9521 |
| robust_light     |   0.7882 |   0.8680 |    0.7944 |   0.8956 |    0.9159 |    0.9490 |
| robust_hard      |   0.6839 |   0.7778 |    0.6900 |   0.8355 |    0.8600 |    0.9055 |
| robust_occlusion |   0.6317 |   0.7317 |    0.6366 |   0.8048 |    0.8060 |    0.8600 |

Compared with the earlier EfficientNet-B3 candidate (no augmentation v2), this model improves clean fine accuracy by +1.6 pts (0.770 → 0.786) and 
robustness substantially — robust_hard +14 pts (0.543 → 0.684) and robust_occlusion +13 pts (0.502 → 0.632). Full comparison: the GitHub experiment report.

## Robustness evaluation

The final model was evaluated under multiple image transforms:

- **clean** — standard validation preprocessing
- **robust_light** — mild production-like perturbations
- **robust_hard** — stronger blur, lighting, color, and geometric perturbations
- **robust_occlusion** — synthetic occlusion/erasing stress test

These tests are not a replacement for real-world field validation, but they quantify how the model behaves under controlled distribution shifts.

## CompCars status

CompCars was inspected and used for external validation and reconnaissance, but it was not blindly merged into final training.

The main reason is that Stanford Cars and CompCars have a significant domain and label-distribution gap:

- Stanford Cars is a clean fine-grained benchmark with 196 make/model/year classes.
- CompCars contains different image domains and different make/model taxonomies.
- Exact Stanford Cars ↔ CompCars make/model/year overlap was too small and biased for safe blind merging.
- Make-level external validation on CompCars dropped to ~28% (vs ~0.87 make accuracy in-domain), confirming a large cross-domain shift.

This confirmed that CompCars integration is a domain adaptation problem, not a simple data-merge task.

Future work should build a verified Stanford Cars ↔ CompCars alias map, train on a controlled filtered subset, and validate on a true cross-domain holdout.

## Held-out Stanford test status

The local Stanford Cars test images were available and were used for qualitative API/demo smoke testing.

However, the available `cars_test_annos.mat` file contained only bounding boxes and filenames:

```text
bbox_x1
bbox_y1
bbox_x2
bbox_y2
fname
```

It did not include class labels.

The provided Kaggle mirror, `eduardo4jesus/stanford-cars-dataset`, was also checked. It included:

```text
cars_meta.mat
cars_train_annos.mat
cars_test_annos.mat
```

but did not provide:

```text
cars_test_annos_withlabels.mat
cars_annos.mat
```

Final quantitative model comparison is therefore reported on the locked Stanford validation split using an identical protocol and seed for every compared model. This keeps model-to-model deltas valid. A labeled held-out Stanford test evaluation would be a straightforward extension if `cars_test_annos_withlabels.mat` is obtained.

## Demo

The model is deployed in a Hugging Face Space:

- **Demo Space:** [https://huggingface.co/spaces/twincar-group2/twincar-demo](https://huggingface.co/spaces/twincar-group2/twincar-demo)

The Space supports:

- image upload
- webcam/clipboard input
- full-image prediction
- make/model/year display
- top-k predictions
- optional YOLO crop comparison mode

The YOLO cropper is experimental and default-off. The official prediction remains the full-image EfficientNet-B3 prediction.

## API and code

Project repository:

- **GitHub:** [https://github.com/Hristijan-kiko/twincar](https://github.com/Hristijan-kiko/twincar)

The GitHub repo includes:

- reusable Python package
- FastAPI inference endpoint
- Gradio demo app
- batch prediction script
- training and evaluation scripts
- robust evaluation reports
- CI with linting and tests

## W&B

Training and experiment tracking:

- **W&B project:** [https://wandb.ai/hzlatevskii-brainster-data-science-academy/twincar](https://wandb.ai/hzlatevskii-brainster-data-science-academy/twincar)

## Intended use

This model is intended for educational and prototype-level vehicle recognition experiments, especially make/model classification from car images similar to Stanford Cars.

Appropriate uses:

- fine-grained vehicle make/model recognition demo
- model comparison and robustness analysis
- prototype vehicle inspection workflow
- academic/academy project demonstration

Data note: weights are trained on the Stanford Cars dataset (research/educational use); the MIT license covers the project code. 
Use of the weights should respect the Stanford Cars dataset terms.

## Limitations

- The final model is trained on Stanford Cars, not on real drone/robot production footage.
- CompCars showed a strong domain gap and was not blindly merged into final training.
- True top-down drone views remain out-of-distribution.
- Robustness tests use controlled synthetic perturbations, not full real-world field validation.
- Year is derived from the fine-grained class label metadata, not learned as an independent year model.
- The system assumes the uploaded image contains a vehicle.
- Strong non-car/out-of-distribution rejection is not implemented yet.
- Similar models and years can be confused because fine-grained vehicle classification often depends on subtle visual details.

## Future work

- Build a verified Stanford Cars ↔ CompCars alias map.
- Fine-tune on a controlled CompCars surveillance subset.
- Add real production/drone/robot images for target-domain validation.
- Add non-car/out-of-distribution rejection.
- Add detector-assisted preprocessing as a validated default only if it improves real metrics.
- Explore multi-head prediction for make/model/year if independent outputs become necessary.