File size: 3,456 Bytes
3f6cb58
 
 
eaee8c9
3f6cb58
 
 
386abf5
3f6cb58
 
 
 
 
 
 
7771550
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3f6cb58
 
 
eaee8c9
3f6cb58
eaee8c9
 
 
3f6cb58
 
 
1e24d6d
3f6cb58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
310a939
3f6cb58
 
 
386abf5
 
 
 
 
 
 
3f6cb58
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: apache-2.0
tags:
- image-classification
- generated_from_trainer
metrics:
- accuracy
- f1
model-index:
- name: vit-accident-image
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Enhancing Road Safety with AI-Powered Accident Detection

## Objective
The objective of this project is to develop an AI-driven system that detects accident scenes from images captured by CCTV footage. By leveraging advanced machine learning techniques, we aim to improve response times to road incidents, thereby enhancing overall road safety.

## Data Sample
We utilized the [Accident Detection from CCTV Footage](https://www.kaggle.com/datasets/ckay16/accident-detection-from-cctv-footage/data) dataset from Kaggle. This dataset contains annotated images from CCTV footage, showcasing various accident scenarios.

### Sample Data
Here’s a sample from the dataset:

| Image | Label |
|-------|-------|
| ![Accident Image]| Accident |

The images are categorized into "Accident" and "No Accident," which helps train the model to distinguish between accident scenes and normal traffic conditions.

## Model Architecture
Our model employs a Vision Transformer (ViT) architecture, which is well-suited for image classification tasks. The key components of the model include:
- **Input Layer:** Accepts images resized to a specified resolution.
- **Transformer Encoder Layers:** Extract features through self-attention mechanisms, capturing spatial relationships.
- **Feedforward Neural Networks:** Process the features and classify them into accident-related categories.
- **Output Layer:** Provides the final classification probabilities for "Accident" and "No Accident."

## Instructions for Running the Training Job
To run the training job, follow these steps:
1. Clone the repository:
   ```bash
   git clone https://github.com/yourusername/accident-detection.git
   cd accident-detection



# vit-accident-image

This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the accident classification dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2027
- Accuracy: 0.93
- F1: 0.9301

## Model description

label 0 : non-accident , label 1 : accident-detected

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
| 0.3546        | 2.0   | 100  | 0.2327          | 0.9184   | 0.9184 |
| 0.1654        | 4.0   | 200  | 0.2075          | 0.9388   | 0.9388 |
| 0.0146        | 6.0   | 300  | 0.2497          | 0.9388   | 0.9387 |
| 0.0317        | 8.0   | 400  | 0.2179          | 0.9286   | 0.9285 |
| 0.0192        | 10.0  | 500  | 0.2255          | 0.9286   | 0.9286 |


### Framework versions

- Transformers 4.30.0
- Pytorch 2.2.1+cu121
- Datasets 2.19.1
- Tokenizers 0.13.3