File size: 11,000 Bytes
77c0190
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5795043
5d61dcc
 
5795043
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cb761d6
5795043
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cb761d6
5795043
 
 
 
 
 
 
 
 
cb761d6
5795043
 
 
 
 
 
 
cb761d6
 
 
 
5795043
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d61dcc
 
 
 
5795043
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cb761d6
5795043
 
 
cb761d6
5795043
 
 
cb761d6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
---
license: mit
library_name: pytorch
tags:
- computer-vision
- image-segmentation
- edge-detection
- line-art
- anime
datasets:
- custom
metrics:
- dice
- iou
pipeline_tag: image-segmentation
---

# Anime Line Art Extraction Segmentation Model
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/6kCBB668giXjJoCLXAzfy.png" width="100%">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/Q5pwvCCAfhl5ctqgsVEPa.png" width="100%">

## Model Description

### Overview
This model performs automatic line art extraction from anime images using a deep learning segmentation approach. The goal of the model is to identify edge structures that form the visual outlines of characters and objects in anime frames.

Extracting clean line art typically requires manual tracing by artists or complex rule-based algorithms. This project explores whether a deep learning segmentation model can learn pixel-level edge structures directly from images.

The model takes an RGB anime frame as input and produces a binary edge mask representing the predicted line art structure.

### Problem & Context
Problem:
Extracting clean line art from images normally requires manual tracing by hand or specialized algorithms.

Why this matters:
- Speed up animation production pipelines
- Assist manga and illustration workflows
- Help beginners learn drawing by tracing outlines
- Improve visual quality by upscaling blurry line art
- Generate datasets for generative AI models

How computer vision helps:
Deep learning segmentation models can learn pixel-level edge structures directly from images.

### Training Approach
The model was trained as a semantic segmentation model using the PyTorch segmentation framework.

Frameworks used:
- PyTorch
- segmentation_models_pytorch

Since no pretrained model exists specifically for anime line extraction, the model was trained using a custom dataset and automatically generated edge masks.

<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/h06ej-ODkw5tDAx3X6KfL.png" width="100%">

### Intended Use Cases
Potential applications include:

- Animation pipelines – converting frames into base line structures
- Digital art tools – assisting artists by generating sketch outlines
- Image upscaling workflows – improving visual quality of blurry lines
- Dataset generation – automatically creating line art datasets for training generative models

Example research question explored in this project:

Can a segmentation model trained on edge masks produce usable line art for artistic workflows?


--------------------------------------------------

# Training Data

## Dataset Source
Images were collected manually from screenshots of anime episodes.

The dataset was assembled specifically for this project to capture common line art structures present in anime animation.

Dataset characteristics:

Total images: 480  
Original resolution: 1920Γ—1080  
Training resolution: 256Γ—256  
Task type: Binary segmentation  

## Classes
Although this is a binary segmentation task, the detected edges represent multiple visual structures:

- Character outlines
- Hair edges
- Facial outlines
- Clothing folds
- Background structures

Pixel labels:

0 = background  
1 = line / edge

## Dataset Split

Train: 384 images  
Validation: 96 images  
Test: Not used

A separate test set was not included due to the relatively small dataset size. The validation set was used to monitor training performance and evaluate model results.

## Data Collection Methodology
Images were collected manually from anime episode screenshots. Frames were chosen to capture a variety of characters, poses, lighting conditions, and scene compositions.

All images were resized to 256Γ—256 resolution to standardize input dimensions for training.

## Annotation Process

Manually labeling line art masks for hundreds of images would be extremely time consuming. Instead, an automated annotation pipeline was used to approximate line structures.

Annotation pipeline: Anime Image β†’ Grayscale conversion β†’ Canny edge detection β†’ Binary edge mask  

Tools used: Python, Google Colab (Jupyter Notebook), OpenCV, PyTorch  

Work performed for the dataset:

- Collected ~480 anime images
- Generated masks automatically using Canny edge detection
- Manually inspected mask quality visually

This approach allowed rapid dataset creation while still ensuring that the generated masks captured meaningful line structures.

## Data Augmentation

No data augmentation techniques were applied.

Images were only resized and normalized during preprocessing.

## Known Dataset Biases

Several limitations exist in the dataset:

- Images are exclusively anime style, creating stylistic bias
- Edge masks generated automatically contain noise
- Some thin edges may be missing due to limitations of Canny detection
- Dataset size is relatively small for deep learning segmentation


--------------------------------------------------

# Training Procedure

## Training Framework

The model was implemented using:

PyTorch  
segmentation_models_pytorch

This library provides segmentation architectures suitable for pixel-level prediction tasks.

## Model Architecture

Architecture:

Encoder: ResNet18  
Decoder: U-Net  
Input: RGB image  
Output: binary edge mask  

U-Net was selected because it performs well for segmentation tasks and works effectively with relatively small datasets.

## Training Hardware

Training was conducted using Google Colab.

Typical environment:

GPU: NVIDIA T4  
VRAM: ~16 GB  
Training time: approximately 1–2 hours

## Hyperparameters

Epochs: 30  
Batch size: 8  
Optimizer: Adam  
Learning rate: 0.0001  
Loss function: Binary Cross Entropy + Dice Loss

## Preprocessing Steps

Images were preprocessed using:

Resize to 256Γ—256  
Normalize using ImageNet statistics

mean = [0.485, 0.456, 0.406]  
std = [0.229, 0.224, 0.225]


--------------------------------------------------

# Evaluation Results

## Metrics

Because this project uses semantic segmentation rather than object detection, evaluation metrics are calculated at the pixel level.

Metrics used:

Dice Coefficient – measures overlap between predicted masks and ground truth masks  
Intersection over Union (IoU) – measures intersection divided by union of predicted and ground truth masks

## Validation Performance

Dice coefficient: ~0.35  
IoU: ~0.21

These metrics indicate that the model is able to detect meaningful edge structures but struggles with extremely thin line details.

<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/zvCczs-TB241YW4FuVIOF.png" width="50%">

## Key Observations

What worked well:

- Learned major character outlines
- Captured hair boundaries
- Detected facial structures

<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/dlkCHCrPtBJPvy7sGSc8j.png" width="100%">

Failure cases:

- Small thin lines
- Dark scenes
- Shading lines interpreted as edges
- Excessive background detail

<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/Hi0LQhIQZvWlAd_44o88H.png" width="100%">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/KnIMiKkePB9aDNausGNDp.png" width="100%">

These results show that the model learned meaningful edge structures despite the noisy annotations generated from Canny edge detection.

## Visual Examples

Typical evaluation visualizations include:

- Input anime frame
- Ground truth edge mask
- Model predicted mask

These comparisons help visually evaluate whether predicted edges align with important structures in the image.

## Performance Analysis

The model demonstrates that segmentation networks can learn edge patterns from anime images even when trained with automatically generated masks.

However, the task presents several challenges:

1. Thin line structures are difficult for segmentation models
2. Automatic annotations introduce noise
3. Low contrast scenes reduce edge detectability

Because the model was only trained for 30 epochs, additional training may improve performance. However, improving annotation quality or training at higher resolution would likely have a larger impact.

<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/PL9-L1MHMEhqmNxY4WQkm.png" width="100%">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/ru-gguNfSDzxbCXT6kmeS.png" width="100%">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/wT3J4LSINPNHVVjLUaqcR.png" width="100%">
--------------------------------------------------

# Limitations and Biases

## Known Failure Cases

The model struggles with:

- Extremely thin lines
- Low contrast scenes
- Dark shading regions
- Highly detailed backgrounds

These cases often produce incomplete or noisy edge predictions.

## Annotation Noise

Ground truth labels were generated automatically using Canny edge detection. This introduces issues such as:

- Missing edges
- False edges from shading
- Broken line segments

Because the model learns from these masks, the maximum achievable accuracy is limited by the quality of the annotations.

## Dataset Bias

The dataset contains only anime frames, introducing strong stylistic bias.

The model may perform poorly on:

- Photographs
- Western illustration styles
- Non-anime artwork

## Resolution Limitations

Images were resized from 1920Γ—1080 to 256Γ—256 for training.

This downscaling removes fine details and makes thin lines harder to detect.

## Sample Size Limitations

The dataset contains only 480 images, which is relatively small for training deep neural networks. A larger dataset would likely improve generalization.

## Inappropriate Use Cases

This model should not be used for:

- Photographic edge detection
- Medical image segmentation
- Object detection tasks

The model is specifically designed for anime-style line structure extraction.


--------------------------------------------------

# Future Work

Possible improvements include:

- Expanding the dataset to thousands of images
- Training at higher resolution (512Γ—512 or higher)
- Improving annotation quality with manual corrections
- Exploring diffusion-based line reconstruction models

Additional research directions include:

- Object detection models for automatic removal of occlusions

<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/NKCNnMBSAzzhAPjaZiX9y.png" width="100%">

- Line art upscaling techniques

<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/9cisCYIkU_y45UJtJRNcE.png" width="100%">

- Using detected edges for stitching animation panning shots

<img src="https://cdn-uploads.huggingface.co/production/uploads/6972a2622ef5ed3b50628995/ZDIrGENzx4oy-Vj_jyQMa.gif" width="100%">