maxim-igenbergs
/

vit

@@ -12,19 +12,13 @@ tags:
   - vit
   - attention
 datasets:
-  - custom
 ---
 # ViT End-to-End Driving Model
 Vision Transformer (ViT) adapted for end-to-end autonomous driving, trained on the Udacity self-driving car simulator for the bachelor's thesis: Dual-Axis Testing of Visual Robustness and Topological Generalization in Vision-based End-to-End Driving Models.
 ## Model Description
 This model applies the Vision Transformer architecture to the end-to-end driving task. Instead of using convolutional layers, ViT splits the input image into patches and processes them using self-attention mechanisms, allowing the model to capture global dependencies in the visual input.
 ### Architecture
 ```
 Input: RGB Image (224 × 224 × 3)
     ↓
@@ -44,36 +38,29 @@ MLP Head
     ↓
 Output: [steering, throttle]
 ```
 ## Checkpoints
 | Map | Checkpoint |
 |-----|------------|
 | GenRoads | `genroads_20251202-152358/` |
 | Jungle | `jungle_20251201-132938/` |
 ### Files per Checkpoint
-- `best_model.ckpt` — PyTorch model checkpoint
-- `meta.json` — Training configuration and hyperparameters
-- `history.csv` — Training/validation metrics per epoch
-- `loss_curve.png` — Visualization of training progress
 ## Citation
 ```bibtex
-@thesis{igenbergs2025dualaxis,
   title={Dual-Axis Testing of Visual Robustness and Topological Generalization in Vision-based End-to-End Driving Models},
   author={Igenbergs, Maxim},
   school={Technical University of Munich},
-  year={2025},
   type={Bachelor's Thesis}
 }
 ```
 ## Related
 - [DAVE-2 Driving Model](https://huggingface.co/maxim-igenbergs/dave2)
 - [DAVE-2-GRU Driving Model](https://huggingface.co/maxim-igenbergs/dave2-gru)
 - [TCP Driving Model](https://huggingface.co/maxim-igenbergs/tcp-carla-repro)
-- [Evaluation Runs Dataset](https://huggingface.co/datasets/maxim-igenbergs/thesis-runs)

   - vit
   - attention
 datasets:
+  - maxim-igenbergs/thesis-data
 ---
 # ViT End-to-End Driving Model
 Vision Transformer (ViT) adapted for end-to-end autonomous driving, trained on the Udacity self-driving car simulator for the bachelor's thesis: Dual-Axis Testing of Visual Robustness and Topological Generalization in Vision-based End-to-End Driving Models.
 ## Model Description
 This model applies the Vision Transformer architecture to the end-to-end driving task. Instead of using convolutional layers, ViT splits the input image into patches and processes them using self-attention mechanisms, allowing the model to capture global dependencies in the visual input.
 ### Architecture
 ```
 Input: RGB Image (224 × 224 × 3)
     ↓
     ↓
 Output: [steering, throttle]
 ```
 ## Checkpoints
 | Map | Checkpoint |
 |-----|------------|
 | GenRoads | `genroads_20251202-152358/` |
 | Jungle | `jungle_20251201-132938/` |
 ### Files per Checkpoint
+- `best_model.ckpt`: PyTorch model checkpoint
+- `meta.json`: Training configuration and hyperparameters
+- `history.csv`: Training/validation metrics per epoch
+- `loss_curve.png`: Visualization of training progress
 ## Citation
 ```bibtex
+@thesis{igenbergs2026dualaxis,
   title={Dual-Axis Testing of Visual Robustness and Topological Generalization in Vision-based End-to-End Driving Models},
   author={Igenbergs, Maxim},
   school={Technical University of Munich},
+  year={2026},
   type={Bachelor's Thesis}
 }
 ```
 ## Related
 - [DAVE-2 Driving Model](https://huggingface.co/maxim-igenbergs/dave2)
 - [DAVE-2-GRU Driving Model](https://huggingface.co/maxim-igenbergs/dave2-gru)
 - [TCP Driving Model](https://huggingface.co/maxim-igenbergs/tcp-carla-repro)
+- [Training Data](https://huggingface.co/datasets/maxim-igenbergs/thesis-data)
+- [Evaluation Runs](https://huggingface.co/datasets/maxim-igenbergs/thesis-runs)