File size: 11,000 Bytes
77c0190 5795043 5d61dcc 5795043 cb761d6 5795043 cb761d6 5795043 cb761d6 5795043 cb761d6 5795043 5d61dcc 5795043 cb761d6 5795043 cb761d6 5795043 cb761d6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 | ---
license: mit
library_name: pytorch
tags:
- computer-vision
- image-segmentation
- edge-detection
- line-art
- anime
datasets:
- custom
metrics:
- dice
- iou
pipeline_tag: image-segmentation
---
# Anime Line Art Extraction Segmentation Model
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/6kCBB668giXjJoCLXAzfy.png" width="100%">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/Q5pwvCCAfhl5ctqgsVEPa.png" width="100%">
## Model Description
### Overview
This model performs automatic line art extraction from anime images using a deep learning segmentation approach. The goal of the model is to identify edge structures that form the visual outlines of characters and objects in anime frames.
Extracting clean line art typically requires manual tracing by artists or complex rule-based algorithms. This project explores whether a deep learning segmentation model can learn pixel-level edge structures directly from images.
The model takes an RGB anime frame as input and produces a binary edge mask representing the predicted line art structure.
### Problem & Context
Problem:
Extracting clean line art from images normally requires manual tracing by hand or specialized algorithms.
Why this matters:
- Speed up animation production pipelines
- Assist manga and illustration workflows
- Help beginners learn drawing by tracing outlines
- Improve visual quality by upscaling blurry line art
- Generate datasets for generative AI models
How computer vision helps:
Deep learning segmentation models can learn pixel-level edge structures directly from images.
### Training Approach
The model was trained as a semantic segmentation model using the PyTorch segmentation framework.
Frameworks used:
- PyTorch
- segmentation_models_pytorch
Since no pretrained model exists specifically for anime line extraction, the model was trained using a custom dataset and automatically generated edge masks.
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/h06ej-ODkw5tDAx3X6KfL.png" width="100%">
### Intended Use Cases
Potential applications include:
- Animation pipelines β converting frames into base line structures
- Digital art tools β assisting artists by generating sketch outlines
- Image upscaling workflows β improving visual quality of blurry lines
- Dataset generation β automatically creating line art datasets for training generative models
Example research question explored in this project:
Can a segmentation model trained on edge masks produce usable line art for artistic workflows?
--------------------------------------------------
# Training Data
## Dataset Source
Images were collected manually from screenshots of anime episodes.
The dataset was assembled specifically for this project to capture common line art structures present in anime animation.
Dataset characteristics:
Total images: 480
Original resolution: 1920Γ1080
Training resolution: 256Γ256
Task type: Binary segmentation
## Classes
Although this is a binary segmentation task, the detected edges represent multiple visual structures:
- Character outlines
- Hair edges
- Facial outlines
- Clothing folds
- Background structures
Pixel labels:
0 = background
1 = line / edge
## Dataset Split
Train: 384 images
Validation: 96 images
Test: Not used
A separate test set was not included due to the relatively small dataset size. The validation set was used to monitor training performance and evaluate model results.
## Data Collection Methodology
Images were collected manually from anime episode screenshots. Frames were chosen to capture a variety of characters, poses, lighting conditions, and scene compositions.
All images were resized to 256Γ256 resolution to standardize input dimensions for training.
## Annotation Process
Manually labeling line art masks for hundreds of images would be extremely time consuming. Instead, an automated annotation pipeline was used to approximate line structures.
Annotation pipeline: Anime Image β Grayscale conversion β Canny edge detection β Binary edge mask
Tools used: Python, Google Colab (Jupyter Notebook), OpenCV, PyTorch
Work performed for the dataset:
- Collected ~480 anime images
- Generated masks automatically using Canny edge detection
- Manually inspected mask quality visually
This approach allowed rapid dataset creation while still ensuring that the generated masks captured meaningful line structures.
## Data Augmentation
No data augmentation techniques were applied.
Images were only resized and normalized during preprocessing.
## Known Dataset Biases
Several limitations exist in the dataset:
- Images are exclusively anime style, creating stylistic bias
- Edge masks generated automatically contain noise
- Some thin edges may be missing due to limitations of Canny detection
- Dataset size is relatively small for deep learning segmentation
--------------------------------------------------
# Training Procedure
## Training Framework
The model was implemented using:
PyTorch
segmentation_models_pytorch
This library provides segmentation architectures suitable for pixel-level prediction tasks.
## Model Architecture
Architecture:
Encoder: ResNet18
Decoder: U-Net
Input: RGB image
Output: binary edge mask
U-Net was selected because it performs well for segmentation tasks and works effectively with relatively small datasets.
## Training Hardware
Training was conducted using Google Colab.
Typical environment:
GPU: NVIDIA T4
VRAM: ~16 GB
Training time: approximately 1β2 hours
## Hyperparameters
Epochs: 30
Batch size: 8
Optimizer: Adam
Learning rate: 0.0001
Loss function: Binary Cross Entropy + Dice Loss
## Preprocessing Steps
Images were preprocessed using:
Resize to 256Γ256
Normalize using ImageNet statistics
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
--------------------------------------------------
# Evaluation Results
## Metrics
Because this project uses semantic segmentation rather than object detection, evaluation metrics are calculated at the pixel level.
Metrics used:
Dice Coefficient β measures overlap between predicted masks and ground truth masks
Intersection over Union (IoU) β measures intersection divided by union of predicted and ground truth masks
## Validation Performance
Dice coefficient: ~0.35
IoU: ~0.21
These metrics indicate that the model is able to detect meaningful edge structures but struggles with extremely thin line details.
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/zvCczs-TB241YW4FuVIOF.png" width="50%">
## Key Observations
What worked well:
- Learned major character outlines
- Captured hair boundaries
- Detected facial structures
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/dlkCHCrPtBJPvy7sGSc8j.png" width="100%">
Failure cases:
- Small thin lines
- Dark scenes
- Shading lines interpreted as edges
- Excessive background detail
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/Hi0LQhIQZvWlAd_44o88H.png" width="100%">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/KnIMiKkePB9aDNausGNDp.png" width="100%">
These results show that the model learned meaningful edge structures despite the noisy annotations generated from Canny edge detection.
## Visual Examples
Typical evaluation visualizations include:
- Input anime frame
- Ground truth edge mask
- Model predicted mask
These comparisons help visually evaluate whether predicted edges align with important structures in the image.
## Performance Analysis
The model demonstrates that segmentation networks can learn edge patterns from anime images even when trained with automatically generated masks.
However, the task presents several challenges:
1. Thin line structures are difficult for segmentation models
2. Automatic annotations introduce noise
3. Low contrast scenes reduce edge detectability
Because the model was only trained for 30 epochs, additional training may improve performance. However, improving annotation quality or training at higher resolution would likely have a larger impact.
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/PL9-L1MHMEhqmNxY4WQkm.png" width="100%">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/ru-gguNfSDzxbCXT6kmeS.png" width="100%">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/wT3J4LSINPNHVVjLUaqcR.png" width="100%">
--------------------------------------------------
# Limitations and Biases
## Known Failure Cases
The model struggles with:
- Extremely thin lines
- Low contrast scenes
- Dark shading regions
- Highly detailed backgrounds
These cases often produce incomplete or noisy edge predictions.
## Annotation Noise
Ground truth labels were generated automatically using Canny edge detection. This introduces issues such as:
- Missing edges
- False edges from shading
- Broken line segments
Because the model learns from these masks, the maximum achievable accuracy is limited by the quality of the annotations.
## Dataset Bias
The dataset contains only anime frames, introducing strong stylistic bias.
The model may perform poorly on:
- Photographs
- Western illustration styles
- Non-anime artwork
## Resolution Limitations
Images were resized from 1920Γ1080 to 256Γ256 for training.
This downscaling removes fine details and makes thin lines harder to detect.
## Sample Size Limitations
The dataset contains only 480 images, which is relatively small for training deep neural networks. A larger dataset would likely improve generalization.
## Inappropriate Use Cases
This model should not be used for:
- Photographic edge detection
- Medical image segmentation
- Object detection tasks
The model is specifically designed for anime-style line structure extraction.
--------------------------------------------------
# Future Work
Possible improvements include:
- Expanding the dataset to thousands of images
- Training at higher resolution (512Γ512 or higher)
- Improving annotation quality with manual corrections
- Exploring diffusion-based line reconstruction models
Additional research directions include:
- Object detection models for automatic removal of occlusions
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/NKCNnMBSAzzhAPjaZiX9y.png" width="100%">
- Line art upscaling techniques
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/9cisCYIkU_y45UJtJRNcE.png" width="100%">
- Using detected edges for stitching animation panning shots
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/ZDIrGENzx4oy-Vj_jyQMa.gif" width="100%">
|