Instructions to use ViettNguyen21/eff_unet_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use ViettNguyen21/eff_unet_v1 with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://ViettNguyen21/eff_unet_v1") - Notebooks
- Google Colab
- Kaggle
Road Scene Semantic Segmentation
A semantic segmentation model for autonomous driving scenes, built with EfficientNetB5 + UNet decoder + ASPP module. The model segments road scenes into 5 classes: background, road surface, road markings, road signs, and cars.
Model Architecture
The model combines a pretrained EfficientNetB5 encoder with a custom UNet-style decoder and an ASPP (Atrous Spatial Pyramid Pooling) module at the bottleneck. This design captures both high-level semantics and fine-grained spatial details — particularly useful for thin structures like lane markings.
Input (256×256×3)
│
EfficientNetB5 Encoder (pretrained on ImageNet)
├── skip connections at multiple resolutions
└── bottleneck feature maps
│
ASPP Module
├── Conv 1×1
├── Dilated Conv rate=6
├── Dilated Conv rate=12
└── Dilated Conv rate=18
│
UNet Decoder
├── Upsample + Skip connection (×4)
└── Conv layers
│
Output (256×256×5) — softmax
Classes
| ID | Class | Description |
|---|---|---|
| 0 | Background | Sky, buildings, trees, sidewalks, and everything else |
| 1 | Road surface | Drivable road area and road shoulders |
| 2 | Marking | Lane markings (driving and non-driving) |
| 3 | Road sign | Traffic signs and signal symbols |
| 4 | Car | Cars, SUVs, pickup trucks |
Performance
Evaluated on a held-out validation set combining custom data and CamVid:
| Class | IoU |
|---|---|
| Background | 0.963 |
| Road surface | 0.921 |
| Marking | 0.399 |
| Road sign | 0.052 |
| Car | 0.839 |
| Mean IoU | ~0.635 |
Training Data
The model was trained on a combination of:
- Custom dataset — ~200 annotated road images with polygon annotations in XML format, covering 5 semantic classes
- CamVid — 367 images with pixel-level annotations from the Cambridge-driving Labeled Video Database, remapped to match the 5-class label scheme
Data augmentation applied during training: horizontal flip, random crop + resize, brightness/contrast/saturation/hue jitter.
Training Details
| Setting | Value |
|---|---|
| Framework | TensorFlow / Keras |
| Input size | 256 × 256 |
| Backbone | EfficientNetB5 (ImageNet pretrained) |
| Loss | Weighted Focal + IoU Loss |
| Optimizer | Adam |
| Stage 1 LR | 1e-3 (decoder only, backbone frozen) |
| Stage 2 LR | 1e-4 (full model fine-tune) |
| Precision | mixed_float16 |
| Batch size | 8 |
Training was done in two stages following standard transfer learning practice: the decoder is trained first while the backbone is frozen, then the full model is fine-tuned with a lower learning rate to avoid destroying pretrained weights.
Usage
import tensorflow as tf
import numpy as np
import cv2
from matplotlib.colors import ListedColormap
# Load model
model = tf.keras.models.load_model(
'model_eff_unet_v1.keras',
custom_objects={
'combined_loss': combined_loss,
'SparseMeanIoU': SparseMeanIoU,
},
compile=False
)
# Prepare image
image = cv2.imread('your_image.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (256, 256))
image = image.astype(np.float32) / 255.0
input_tensor = np.expand_dims(image, axis=0)
# Predict
pred = model.predict(input_tensor)
pred_mask = tf.argmax(pred[0], axis=-1).numpy()
# Visualize
class_cmap = ListedColormap([
'black', # Background
'#804080', # Road surface
'white', # Marking
'red', # Road sign
'navy', # Car
])
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.imshow(image)
plt.title('Input Image')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(pred_mask, cmap=class_cmap, vmin=0, vmax=4)
plt.title('Segmentation Prediction')
plt.axis('off')
plt.show()
Limitations
- Trained primarily on daytime, clear-weather road scenes. Performance may degrade on night scenes, rain, fog, or unusual camera angles.
- Road marking detection (IoU ~0.30) is weaker than road surface detection (IoU ~0.92) due to class imbalance and the small pixel area of lane markings.
- Input resolution is fixed at 256×256. Very small objects (distant signs, thin markings) may be missed.
- Not suitable for safety-critical applications without further validation on a larger and more diverse dataset.
Repository Structure
├── main.ipynb # Training notebook
├── implement.ipynb # Inference notebook
├── model_eff_unet_v1.h5
└── model_eff_unet_v1.keras
Citation
If you use CamVid data in your work, please cite the original dataset:
@inproceedings{BrostowSFC:ECCV08,
author = {Gabriel J. Brostow and Jamie Shotton and Julien Fauqueur and Roberto Cipolla},
title = {Segmentation and Recognition Using Structure from Motion Point Clouds},
booktitle = {ECCV (1)},
year = {2008},
pages = {44-57}
}
## Dataset
https://www.kaggle.com/datasets/trainingdatapro/roads-segmentation-dataset
https://www.kaggle.com/datasets/carlolepelaars/camvid
License
MIT License
- Downloads last month
- 6


