| | --- |
| | license: apache-2.0 |
| | base_model: facebook/deit-tiny-distilled-patch16-224 |
| | tags: |
| | - generated_from_trainer |
| | metrics: |
| | - accuracy |
| | model-index: |
| | - name: results |
| | results: [] |
| | pipeline_tag: image-classification |
| | datasets: Mozilla/docornot |
| |
|
| | --- |
| | |
| | This model is a fine-tuned version of [facebook/deit-tiny-distilled-patch16-224](https://huggingface.co/facebook/deit-tiny-distilled-patch16-224) on the [docornot](https://huggingface.co/datasets/tarekziade/docornot) dataset. |
| |
|
| | It achieves the following results on the evaluation set: |
| | - Loss: 0.0000 |
| | - Accuracy: 1.0 |
| |
|
| |
|
| | # CO2 emissions |
| |
|
| | This model was trained on an M1 and took 0.322 g of CO2 (measured with [CodeCarbon](https://codecarbon.io/)) |
| |
|
| | # Model description |
| |
|
| | This model is distilled Vision Transformer (ViT) model. |
| | Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. |
| |
|
| | # Intended uses & limitations |
| |
|
| | You can use this model to detect if an image is a picture or a document. |
| |
|
| | # Training procedure |
| |
|
| | Source code used to generate this model : https://github.com/mozilla/docornot |
| |
|
| | ## Training hyperparameters |
| |
|
| | The following hyperparameters were used during training: |
| | - learning_rate: 5e-05 |
| | - train_batch_size: 8 |
| | - eval_batch_size: 8 |
| | - seed: 42 |
| | - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
| | - lr_scheduler_type: linear |
| | - num_epochs: 1 |
| |
|
| | ## Training results |
| |
|
| | | Training Loss | Epoch | Step | Validation Loss | Accuracy | |
| | |:-------------:|:-----:|:----:|:---------------:|:--------:| |
| | | 0.0 | 1.0 | 1600 | 0.0000 | 1.0 | |
| |
|
| |
|
| | ## Framework versions |
| |
|
| | - Transformers 4.39.2 |
| | - Pytorch 2.2.2 |
| | - Datasets 2.18.0 |
| | - Tokenizers 0.15.2 |