thookham commited on
Commit
5dc545c
·
verified ·
1 Parent(s): e9f9fd3

Update README.md with correct metadata/model card

Browse files
Files changed (1) hide show
  1. README.md +215 -542
README.md CHANGED
@@ -1,542 +1,215 @@
1
-
2
- # DeOldify (Modernized)
3
-
4
- ![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)
5
- ![PyTorch 2.5+](https://img.shields.io/badge/PyTorch-2.5+-ee4c2c.svg)
6
- ![CUDA 12.x](https://img.shields.io/badge/CUDA-12.x-76B900.svg)
7
- ![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
8
-
9
- # DeOldify (Modernized)
10
-
11
- **DeOldify** has been modernized! This fork updates the project to support **PyTorch 2.5+**, **CUDA 12.x**, and **Intel GPUs (Arc/Data Center)**. It removes the dependency on the obsolete FastAI 1.x library, making it easier to run on modern hardware.
12
-
13
- **Quick Start**:
14
- - **NVIDIA GPU**: [Setup Guide](docs/nvidia_setup.md)
15
- - **Intel GPU**: [Setup Guide](docs/intel_gpu_setup.md)
16
-
17
- **Original Project**: The original DeOldify by Jason Antic can be found [here](https://github.com/jantic/DeOldify).
18
-
19
- **In Browser (new!)**
20
- You can run DeOldify directly in your browser without any installation! We've included a local browser-based implementation in this repository.
21
-
22
- **How to use:**
23
- 1. Navigate to the `browser/` folder in this repository.
24
- 2. Open `index.html` in Chrome, Firefox, or Edge.
25
- 3. Choose between the **Artistic** (higher quality) or **Quantized** (faster) model.
26
- 4. Select an image and watch it colorize instantly!
27
-
28
- *Note: This implementation uses ONNX models hosted on our GitHub releases and Hugging Face, ensuring privacy and availability.*
29
-
30
- Also check out the original browser repo: https://github.com/akbartus/DeOldify-on-Browser
31
-
32
- **Hugging Face 🤗**: All models are also available on Hugging Face: [thookham/DeOldify](https://huggingface.co/thookham/DeOldify)
33
-
34
- The **most advanced** version of DeOldify image colorization is available here,
35
- exclusively. Try a few images for free! [MyHeritage In Color](https://www.myheritage.com/incolor)
36
-
37
- **Replicate:** Image: <a href="https://replicate.com/arielreplicate/deoldify_image"><img src="https://replicate.com/arielreplicate/deoldify_image/badge"></a> | Video: <a href="https://replicate.com/arielreplicate/deoldify_video"><img src="https://replicate.com/arielreplicate/deoldify_video/badge"></a>
38
-
39
- ----------------------------
40
-
41
- Image (artistic) [![Colab for images](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jantic/DeOldify/blob/master/ImageColorizerColab.ipynb)
42
- | Video [![Colab for video](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jantic/DeOldify/blob/master/VideoColorizerColab.ipynb)
43
-
44
- Having trouble with the default image colorizer, aka "artistic"? Try the
45
- "stable" one below. It generally won't produce colors that are as interesting as
46
- "artistic", but the glitches are noticeably reduced.
47
-
48
- Image (stable) [![Colab for stable model](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jantic/DeOldify/blob/master/ImageColorizerColabStable.ipynb)
49
-
50
- Instructions on how to use the Colabs above have been kindly provided in video
51
- tutorial form by Old Ireland in Colour's John Breslin. It's great! Click video
52
- image below to watch.
53
-
54
- [![DeOldify Tutorial](http://img.youtube.com/vi/VaEl0faDw38/0.jpg)](http://www.youtube.com/watch?v=VaEl0faDw38)
55
-
56
- Get more updates on [Twitter
57
- ![Twitter logo](resource_images/twitter.svg)](https://twitter.com/DeOldify).
58
-
59
- ## Table of Contents
60
-
61
- - [About DeOldify](#about-deoldify)
62
- - [Example Videos](#example-videos)
63
- - [Example Images](#example-images)
64
- - [Stuff That Should Probably Be In A Paper](#stuff-that-should-probably-be-in-a-paper)
65
- - [How to Achieve Stable Video](#how-to-achieve-stable-video)
66
- - [What is NoGAN?](#what-is-nogan)
67
- - [Why Three Models?](#why-three-models)
68
- - [Technical Details](#the-technical-details)
69
- - [Going Forward](#this-project-going-forward)
70
- - [Roadmap](ROADMAP.md)
71
- - [Getting Started Yourself](#getting-started-yourself)
72
- - [Easiest Approach](#easiest-approach)
73
- - [Your Own Machine](#your-own-machine-not-as-easy)
74
- - [Pretrained Weights](#pretrained-weights)
75
-
76
- ## About DeOldify
77
-
78
- Simply put, the mission of this project is to colorize and restore old images and
79
- film footage. We'll get into the details in a bit, but first let's see some
80
- pretty pictures and videos!
81
-
82
- ### New and Exciting Stuff in DeOldify
83
-
84
- - Glitches and artifacts are almost entirely eliminated
85
- - Better skin (less zombies)
86
- - More highly detailed and photorealistic renders
87
- - Much less "blue bias"
88
- - **Video** - it actually looks good!
89
- - **NoGAN** - a new and weird but highly effective way to do GAN training for
90
- image to image.
91
-
92
- ## Example Videos
93
-
94
- **Note:** Click images to watch
95
-
96
- ### Facebook F8 Demo
97
-
98
- [![DeOldify Facebook F8 Movie Colorization Demo](http://img.youtube.com/vi/l3UXXid04Ys/0.jpg)](http://www.youtube.com/watch?v=l3UXXid04Ys)
99
-
100
- ### Silent Movie Examples
101
-
102
- [![DeOldify Silent Movie Examples](http://img.youtube.com/vi/EXn-n2iqEjI/0.jpg)](http://www.youtube.com/watch?v=EXn-n2iqEjI)
103
-
104
- ## Example Images
105
-
106
- "Migrant Mother" by Dorothea Lange (1936)
107
-
108
- ![Migrant Mother](https://i.imgur.com/Bt0vnke.jpg)
109
-
110
- Woman relaxing in her livingroom in Sweden (1920)
111
-
112
- ![Sweden Living Room](https://i.imgur.com/158d0oU.jpg)
113
-
114
- "Toffs and Toughs" by Jimmy Sime (1937)
115
-
116
- ![Class Divide](https://i.imgur.com/VYuav4I.jpg)
117
-
118
- Thanksgiving Maskers (1911)
119
-
120
- ![Thanksgiving Maskers](https://i.imgur.com/n8qVJ5c.jpg)
121
-
122
- Glen Echo Madame Careta Gypsy Camp in Maryland (1925)
123
-
124
- ![Gypsy Camp](https://i.imgur.com/1oYrJRI.jpg)
125
-
126
- "Mr. and Mrs. Lemuel Smith and their younger children in their farm house,
127
- Carroll County, Georgia." (1941)
128
-
129
- ![Georgia Farmhouse](https://i.imgur.com/I2j8ynm.jpg)
130
-
131
- "Building the Golden Gate Bridge" (est 1937)
132
-
133
- ![Golden Gate Bridge](https://i.imgur.com/6SbFjfq.jpg)
134
-
135
- > **Note:** What you might be wondering is while this render looks cool, are the
136
- > colors accurate? The original photo certainly makes it look like the towers of
137
- > the bridge could be white. We looked into this and it turns out the answer is
138
- > no - the towers were already covered in red primer by this time. So that's
139
- > something to keep in mind- historical accuracy remains a huge challenge!
140
-
141
- "Terrasse de café, Paris" (1925)
142
-
143
- ![Cafe Paris](https://i.imgur.com/WprQwP5.jpg)
144
-
145
- Norwegian Bride (est late 1890s)
146
-
147
- ![Norwegian Bride](https://i.imgur.com/MmtvrZm.jpg)
148
-
149
- Zitkála-Šá (Lakota: Red Bird), also known as Gertrude Simmons Bonnin (1898)
150
-
151
- ![Native Woman](https://i.imgur.com/zIGM043.jpg)
152
-
153
- Chinese Opium Smokers (1880)
154
-
155
- ![Opium Real](https://i.imgur.com/lVGq8Vq.jpg)
156
-
157
- ## Stuff That Should Probably Be In A Paper
158
-
159
- ### How to Achieve Stable Video
160
-
161
- NoGAN training is crucial to getting the kind of stable and colorful images seen
162
- in this iteration of DeOldify. NoGAN training combines the benefits of GAN
163
- training (wonderful colorization) while eliminating the nasty side effects
164
- (like flickering objects in video). Believe it or not, video is rendered using
165
- isolated image generation without any sort of temporal modeling tacked on. The
166
- process performs 30-60 minutes of the GAN portion of "NoGAN" training, using 1%
167
- to 3% of imagenet data once. Then, as with still image colorization, we
168
- "DeOldify" individual frames before rebuilding the video.
169
-
170
- In addition to improved video stability, there is an interesting thing going on
171
- here worth mentioning. It turns out the models I run, even different ones and
172
- with different training structures, keep arriving at more or less the same
173
- solution. That's even the case for the colorization of things you may think
174
- would be arbitrary and unknowable, like the color of clothing, cars, and even
175
- special effects (as seen in "Metropolis").
176
-
177
- ![Metropolis Special FX](https://thumbs.gfycat.com/HeavyLoneBlowfish-size_restricted.gif)
178
-
179
- My best guess is that the models are learning some interesting rules about how to
180
- colorize based on subtle cues present in the black and white images that I
181
- certainly wouldn't expect to exist. This result leads to nicely deterministic and
182
- consistent results, and that means you don't have track model colorization
183
- decisions because they're not arbitrary. Additionally, they seem remarkably
184
- robust so that even in moving scenes the renders are very consistent.
185
-
186
- ![Moving Scene Example](https://thumbs.gfycat.com/FamiliarJubilantAsp-size_restricted.gif)
187
-
188
- Other ways to stabilize video add up as well. First, generally speaking rendering
189
- at a higher resolution (higher render_factor) will increase stability of
190
- colorization decisions. This stands to reason because the model has higher
191
- fidelity image information to work with and will have a greater chance of making
192
- the "right" decision consistently. Closely related to this is the use of
193
- resnet101 instead of resnet34 as the backbone of the generator- objects are
194
- detected more consistently and correctly with this. This is especially important
195
- for getting good, consistent skin rendering. It can be particularly visually
196
- jarring if you wind up with "zombie hands", for example.
197
-
198
- ![Zombie Hand Example](https://thumbs.gfycat.com/ThriftyInferiorIsabellinewheatear-size_restricted.gif)
199
-
200
- Additionally, gaussian noise augmentation during training appears to help but at
201
- this point the conclusions as to just how much are bit more tenuous (I just
202
- haven't formally measured this yet). This is loosely based on work done in style
203
- transfer video, described here:
204
- <https://medium.com/element-ai-research-lab/stabilizing-neural-style-transfer-for-video-62675e203e42>.
205
-
206
- Special thanks go to Rani Horev for his contributions in implementing this noise
207
- augmentation.
208
-
209
- ### What is NoGAN?
210
-
211
- This is a new type of GAN training that I've developed to solve some key problems
212
- in the previous DeOldify model. It provides the benefits of GAN training while
213
- spending minimal time doing direct GAN training. Instead, most of the training
214
- time is spent pretraining the generator and critic separately with more
215
- straight-forward, fast and reliable conventional methods. A key insight here is
216
- that those more "conventional" methods generally get you most of the results you
217
- need, and that GANs can be used to close the gap on realism. During the very
218
- short amount of actual GAN training the generator not only gets the full
219
- realistic colorization capabilities that used to take days of progressively
220
- resized GAN training, but it also doesn't accrue nearly as much of the artifacts
221
- and other ugly baggage of GANs. In fact, you can pretty much eliminate glitches
222
- and artifacts almost entirely depending on your approach. As far as I know this
223
- is a new technique. And it's incredibly effective.
224
-
225
- #### Original DeOldify Model
226
-
227
- ![Before Flicker](https://thumbs.gfycat.com/CoordinatedVeneratedHogget-size_restricted.gif)
228
-
229
- #### NoGAN-Based DeOldify Model
230
-
231
- ![After Flicker](https://thumbs.gfycat.com/OilyBlackArctichare-size_restricted.gif)
232
-
233
- The steps are as follows: First train the generator in a conventional way by
234
- itself with just the feature loss. Next, generate images from that, and train
235
- the critic on distinguishing between those outputs and real images as a basic
236
- binary classifier. Finally, train the generator and critic together in a GAN
237
- setting (starting right at the target size of 192px in this case). Now for
238
- the weird part: All the useful GAN training here only takes place within a very
239
- small window of time. There's an inflection point where it appears the critic
240
- has transferred everything it can that is useful to the generator. Past this
241
- point, image quality oscillates between the best that you can get at the
242
- inflection point, or bad in a predictable way (orangish skin, overly red lips,
243
- etc). There appears to be no productive training after the inflection point.
244
- And this point lies within training on just 1% to 3% of the Imagenet Data!
245
- That amounts to about 30-60 minutes of training at 192px.
246
-
247
- The hard part is finding this inflection point. So far, I've accomplished this
248
- by making a whole bunch of model save checkpoints (every 0.1% of data iterated
249
- on) and then just looking for the point where images look great before they go
250
- totally bonkers with orange skin (always the first thing to go). Additionally,
251
- generator rendering starts immediately getting glitchy and inconsistent at this
252
- point, which is no good particularly for video. What I'd really like to figure
253
- out is what the tell-tale sign of the inflection point is that can be easily
254
- automated as an early stopping point. Unfortunately, nothing definitive is
255
- jumping out at me yet. For one, it's happening in the middle of training loss
256
- decreasing- not when it flattens out, which would seem more reasonable on the surface.
257
-
258
- Another key thing about NoGAN training is you can repeat pretraining the critic
259
- on generated images after the initial GAN training, then repeat the GAN training
260
- itself in the same fashion. This is how I was able to get extra colorful results
261
- with the "artistic" model. But this does come at a cost currently- the output of
262
- the generator becomes increasingly inconsistent and you have to experiment with
263
- render resolution (render_factor) to get the best result. But the renders are
264
- still glitch free and way more consistent than I was ever able to achieve with
265
- the original DeOldify model. You can do about five of these repeat cycles, give
266
- or take, before you get diminishing returns, as far as I can tell.
267
-
268
- Keep in mind- I haven't been entirely rigorous in figuring out what all is going
269
- on in NoGAN- I'll save that for a paper. That means there's a good chance I'm
270
- wrong about something. But I think it's definitely worth putting out there now
271
- because I'm finding it very useful- it's solving basically much of my remaining
272
- problems I had in DeOldify.
273
-
274
- This builds upon a technique developed in collaboration with Jeremy Howard and
275
- Sylvain Gugger for Fast.AI's Lesson 7 in version 3 of Practical Deep Learning
276
- for Coders Part I. The particular lesson notebook can be found here:
277
- <https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson7-superres-gan.ipynb>
278
-
279
- ## Why Three Models?
280
-
281
- There are now three models to choose from in DeOldify. Each of these has key
282
- strengths and weaknesses, and so have different use cases. Video is for video
283
- of course. But stable and artistic are both for images, and sometimes one will
284
- do images better than the other.
285
-
286
- More details:
287
-
288
- - **Artistic** - This model achieves the highest quality results in image
289
- coloration, in terms of interesting details and vibrance. The most notable
290
- drawback however is that it's a bit of a pain to fiddle around with to get the
291
- best results (you have to adjust the rendering resolution or render_factor to
292
- achieve this). Additionally, the model does not do as well as stable in a few
293
- key common scenarios- nature scenes and portraits. The model uses a resnet34
294
- backbone on a UNet with an emphasis on depth of layers on the decoder side.
295
- This model was trained with 5 critic pretrain/GAN cycle repeats via NoGAN, in
296
- addition to the initial generator/critic pretrain/GAN NoGAN training, at 192px.
297
- This adds up to a total of 32% of Imagenet data trained once (12.5 hours of
298
- direct GAN training).
299
-
300
- - **Stable** - This model achieves the best results with landscapes and
301
- portraits. Notably, it produces less "zombies"- where faces or limbs stay gray
302
- rather than being colored in properly. It generally has less weird
303
- miscolorations than artistic, but it's also less colorful in general. This
304
- model uses a resnet101 backbone on a UNet with an emphasis on width of layers on
305
- the decoder side. This model was trained with 3 critic pretrain/GAN cycle
306
- repeats via NoGAN, in addition to the initial generator/critic pretrain/GAN
307
- NoGAN training, at 192px. This adds up to a total of 7% of Imagenet data
308
- trained once (3 hours of direct GAN training).
309
-
310
- - **Video** - This model is optimized for smooth, consistent and flicker-free
311
- video. This would definitely be the least colorful of the three models, but
312
- it's honestly not too far off from "stable". The model is the same as "stable"
313
- in terms of architecture, but differs in training. It's trained for a mere 2.2%
314
- of Imagenet data once at 192px, using only the initial generator/critic
315
- pretrain/GAN NoGAN training (1 hour of direct GAN training).
316
-
317
- Because the training of the artistic and stable models was done before the
318
- "inflection point" of NoGAN training described in "What is NoGAN???" was
319
- discovered, I believe this amount of training on them can be knocked down
320
- considerably. As far as I can tell, the models were stopped at "good points"
321
- that were well beyond where productive training was taking place. I'll be
322
- looking into this in the future.
323
-
324
- Ideally, eventually these three models will be consolidated into one that has all
325
- these good desirable unified. I think there's a path there, but it's going to
326
- require more work! So for now, the most practical solution appears to be to
327
- maintain multiple models.
328
-
329
-
330
-
331
-
332
-
333
- ## The Technical Details
334
-
335
- This is a deep learning based model. More specifically, what I've done is
336
- combined the following approaches:
337
-
338
- ### [Self-Attention Generative Adversarial Network](https://arxiv.org/abs/1805.08318)
339
-
340
- Except the generator is a **pretrained U-Net**, and I've just modified it to
341
- have the spectral normalization and self-attention. It's a pretty
342
- straightforward translation.
343
-
344
- ### [Two Time-Scale Update Rule](https://arxiv.org/abs/1706.08500)
345
-
346
- This is also very straightforward – it's just one to one generator/critic
347
- iterations and higher critic learning rate.
348
- This is modified to incorporate a "threshold" critic loss that makes sure that
349
- the critic is "caught up" before moving on to generator training.
350
- This is particularly useful for the "NoGAN" method described below.
351
-
352
- ### NoGAN
353
-
354
- There's no paper here! This is a new type of GAN training that I've developed to
355
- solve some key problems in the previous DeOldify model.
356
- The gist is that you get the benefits of GAN training while spending minimal time
357
- doing direct GAN training.
358
- More details are in the [What is NoGAN?](#what-is-nogan) section (it's a doozy).
359
-
360
- ### Generator Loss
361
-
362
- Loss during NoGAN learning is two parts: One is a basic Perceptual Loss (or
363
- Feature Loss) based on VGG16 – this just biases the generator model to replicate
364
- the input image.
365
- The second is the loss score from the critic. For the curious – Perceptual Loss
366
- isn't sufficient by itself to produce good results.
367
- It tends to just encourage a bunch of brown/green/blue – you know, cheating to
368
- the test, basically, which neural networks are really good at doing!
369
- Key thing to realize here is that GANs essentially are learning the loss function
370
- for you – which is really one big step closer to toward the ideal that we're
371
- shooting for in machine learning.
372
- And of course you generally get much better results when you get the machine to
373
- learn something you were previously hand coding.
374
- That's certainly the case here.
375
-
376
- **Of note:** There's no longer any "Progressive Growing of GANs" type training
377
- going on here. It's just not needed in lieu of the superior results obtained
378
- by the "NoGAN" technique described above.
379
-
380
- The beauty of this model is that it should be generally useful for all sorts of
381
- image modification, and it should do it quite well.
382
- What you're seeing above are the results of the colorization model, but that's
383
- just one component in a pipeline that I'm developing with the exact same approach.
384
-
385
- ## This Project, Going Forward
386
-
387
- So that's the gist of this project – I'm looking to make old photos and film
388
- look reeeeaaally good with GANs, and more importantly, make the project *useful*.
389
- In the meantime though this is going to be my baby and I'll be actively updating
390
- and improving the code over the foreseeable future.
391
- I'll try to make this as user-friendly as possible, but I'm sure there's going
392
- to be hiccups along the way.
393
-
394
- Oh and I swear I'll document the code properly...eventually. Admittedly I'm
395
- *one of those* people who believes in "self documenting code" (LOL).
396
-
397
- ## Best Practices & Golden Nuggets
398
-
399
- Based on extensive community research and the original author's insights, here are the "Golden Nuggets" for getting the best results:
400
-
401
- ### 1. Video Flicker? Use the "Video" Model!
402
- If you are experiencing flickering in your video outputs, ensure you are using the **Video** model weights (`ColorizeVideo_gen.pth`). This model was specifically trained with **NoGAN** to prioritize temporal consistency over raw color vibrancy. The "Artistic" model will almost always flicker on video.
403
-
404
- ### 2. The "NoGAN" Secret
405
- The core innovation of DeOldify is **NoGAN** training. It pre-trains the generator with a conventional loss function (Perceptual Loss) before introducing the GAN component. This minimizes the "GAN artifacts" (like flicker) while keeping the colorization quality.
406
-
407
- ### 3. Post-Processing is Key
408
- Even with the Video model, some flicker may persist. We recommend using **FFmpeg's `deflicker` filter** as a post-processing step.
409
- * **New Feature**: We have added a `deflicker=True` option to the `VideoColorizer` to handle this automatically!
410
-
411
- ### 4. Alternative Implementations
412
- * **Anime/Manga**: If you are colorizing anime sketches, check out [AnimeColorDeOldify](https://github.com/Dakini/AnimeColorDeOldify), which uses a model fine-tuned on Danbooru.
413
- * **C# / .NET**: For a native C# implementation, see [DeOldify.NET](https://github.com/ColorfulSoft/DeOldify.NET).
414
-
415
- ## Getting Started Yourself
416
- have that yet so I'm not going to make it the default instruction here yet.
417
-
418
- **Alternative Install:** User daddyparodz has kindly created an installer script
419
- for Ubuntu, and in particular Ubuntu on WSL, that may make things easier:
420
- <https://github.com/daddyparodz/AutoDeOldifyLocal>
421
-
422
- #### Note on test_images Folder
423
-
424
- The images in the `test_images` folder have been removed because they were using
425
- Git LFS and that costs a lot of money when GitHub actually charges for bandwidth
426
- on a popular open source project (they had a billing bug for while that was
427
- recently fixed). The notebooks that use them (the image test ones) still point
428
- to images in that directory that I (Jason) have personally and I'd like to keep
429
- it that way because, after all, I'm by far the primary and most active developer.
430
- But they won't work for you. Still, those notebooks are a convenient template
431
- for making your own tests if you're so inclined.
432
-
433
- #### Typical training
434
-
435
- The notebook `ColorizeTrainingWandb` has been created to log and monitor results
436
- through [Weights & Biases](https://www.wandb.com/). You can find a description of
437
- typical training by consulting [W&B Report](https://app.wandb.ai/borisd13/DeOldify/reports?view=borisd13%2FDeOldify).
438
-
439
- ## Pretrained Weights
440
-
441
- To start right away on your own machine with your own images or videos without
442
- training the models yourself, you'll need to download the "Completed Generator
443
- Weights" listed below and drop them in the /models/ folder.
444
-
445
- The colorization inference notebooks should be able to guide you from here. The
446
- notebooks to use are named ImageColorizerArtistic.ipynb,
447
- ImageColorizerStable.ipynb, and VideoColorizer.ipynb.
448
-
449
- ### Completed Generator Weights
450
-
451
- - [Artistic](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeArtistic_gen.pth)
452
- - [Stable](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeStable_gen.pth)
453
- - [Video](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeVideo_gen.pth)
454
-
455
- ### Completed Critic Weights
456
-
457
- - [Artistic](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeArtistic_crit.pth)
458
- - [Stable](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeStable_crit.pth)
459
- - [Video](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeVideo_crit.pth)
460
-
461
- ### Pretrain Only Generator Weights
462
-
463
- > **Note:** The Stable and Video PretrainOnly generator weights are split into multiple parts due to their size. Please download all parts (e.g., `.pth.000`, `.pth.001`) and run `python reassemble_models.py` to join them.
464
-
465
- - [Artistic](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeArtistic_PretrainOnly_gen.pth)
466
- - [Stable (Part 1)](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeStable_PretrainOnly_gen.pth.000) | [Stable (Part 2)](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeStable_PretrainOnly_gen.pth.001)
467
- - [Video (Part 1)](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeVideo_PretrainOnly_gen.pth.000) | [Video (Part 2)](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeVideo_PretrainOnly_gen.pth.001)
468
-
469
- ### Pretrain Only Critic Weights
470
-
471
- - [Artistic](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeArtistic_PretrainOnly_crit.pth)
472
- - [Stable](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeStable_PretrainOnly_crit.pth)
473
- - [Video](https://github.com/thookham/DeOldify/releases/download/v2.0-models/ColorizeVideo_PretrainOnly_crit.pth)
474
-
475
- ### Archived Models (Browser / ONNX)
476
-
477
- - [Artistic ONNX](https://github.com/thookham/DeOldify/releases/download/v2.0-models/deoldify-art.onnx)
478
- - [Quantized ONNX](https://github.com/thookham/DeOldify/releases/download/v2.0-models/deoldify-quant.onnx)
479
-
480
- ## Want the Old DeOldify?
481
-
482
- We suspect some of you are going to want access to the original DeOldify model
483
- for various reasons. We have that archived here: <https://github.com/dana-kelley/DeOldify>
484
-
485
- ## Want More?
486
-
487
- Follow [#DeOldify](https://twitter.com/search?q=%23Deoldify) on Twitter.
488
-
489
- ## License
490
-
491
- All code in this repository is under the MIT license as specified by the LICENSE
492
- file.
493
-
494
- The model weights listed in this readme under the "Pretrained Weights" section
495
- are trained by ourselves and are released under the MIT license.
496
-
497
- ## A Statement on Open Source Support
498
-
499
- We believe that open source has done a lot of good for the world. After all,
500
- DeOldify simply wouldn't exist without it. But we also believe that there needs
501
- to be boundaries on just how much is reasonable to be expected from an open
502
- source project maintained by just two developers.
503
-
504
- Our stance is that we're providing the code and documentation on research that
505
- we believe is beneficial to the world. What we have provided are novel takes
506
- on colorization, GANs, and video that are hopefully somewhat friendly for
507
- developers and researchers to learn from and adopt. This is the culmination of
508
- well over a year of continuous work, free for you. What wasn't free was
509
- shouldered by us, the developers. We left our jobs, bought expensive GPUs, and
510
- had huge electric bills as a result of dedicating ourselves to this.
511
-
512
- What we haven't provided here is a ready to use free "product" or "app", and we
513
- don't ever intend on providing that. It's going to remain a Linux based project
514
- without Windows support, coded in Python, and requiring people to have some extra
515
- technical background to be comfortable using it. Others have stepped in with
516
- their own apps made with DeOldify, some paid and some free, which is what we want!
517
- We're instead focusing on what we believe we can do best- making better
518
- commercial models that people will pay for.
519
- Does that mean you're not getting the very best for free? Of course. We simply
520
- don't believe that we're obligated to provide that, nor is it feasible! We
521
- compete on research and sell that. Not a GUI or web service that wraps said
522
- research- that part isn't something we're going to be great at anyways. We're not
523
- about to shoot ourselves in the foot by giving away our actual competitive
524
- advantage for free, quite frankly.
525
-
526
- We're also not willing to go down the rabbit hole of providing endless, open
527
- ended and personalized support on this open source project. Our position is
528
- this: If you have the proper background and resources, the project provides
529
- more than enough to get you started. We know this because we've seen plenty of
530
- people using it and making money off of their own projects with it.
531
-
532
- Thus, if you have an issue come up and it happens to be an actual bug that
533
- having it be fixed will benefit users generally, then great- that's something
534
- we'll be happy to look into.
535
-
536
-
537
- In contrast, if you're asking about something that really amounts to asking for
538
- personalized and time consuming support that won't benefit anybody else, we're
539
- not going to help. It's simply not in our interest to do that. We have bills to
540
- pay, after all. And if you're asking for help on something that can already be
541
- derived from the documentation or code? That's simply annoying, and we're not
542
- going to pretend to be ok with that.
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - image-colorization
5
+ - gan
6
+ - computer-vision
7
+ - pytorch
8
+ - onnx
9
+ library_name: pytorch
10
+ ---
11
+
12
+ # DeOldify Model Weights
13
+
14
+ This repository contains pretrained weights for **DeOldify**, a deep learning model for colorizing and restoring old black and white images and videos.
15
+
16
+ **Original Repository**: [thookham/DeOldify](https://github.com/thookham/DeOldify)
17
+ **Original Author**: Jason Antic ([jantic/DeOldify](https://github.com/jantic/DeOldify))
18
+
19
+ ## Model Overview
20
+
21
+ DeOldify uses a Self-Attention Generative Adversarial Network (SAGAN) with a novel **NoGAN** training approach to achieve stable, high-quality colorization without the typical GAN artifacts.
22
+
23
+ ### Three Specialized Models
24
+
25
+ 1. **Artistic** - Highest quality with vibrant colors and interesting details
26
+ - Best for: General images, historical photos
27
+ - Backbone: ResNet34 U-Net
28
+ - Training: 5 NoGAN cycles, 32% ImageNet
29
+
30
+ 2. **Stable** - Best for portraits and landscapes, reduced artifacts
31
+ - Best for: Faces, nature scenes
32
+ - Backbone: ResNet101 U-Net
33
+ - Training: 3 NoGAN cycles, 7% ImageNet
34
+
35
+ 3. **Video** - Optimized for smooth, flicker-free video
36
+ - Best for: Video colorization, consistency
37
+ - Backbone: ResNet101 U-Net
38
+ - Training: Initial cycle only, 2.2% ImageNet
39
+
40
+ ## Available Files
41
+
42
+ ### ONNX Models (Browser/Inference)
43
+
44
+ | File | Size | Description |
45
+ |------|------|-------------|
46
+ | `deoldify-art.onnx` | 243 MB | Artistic model in ONNX format for browser use |
47
+ | `deoldify-quant.onnx` | 61 MB | Quantized artistic model (75% smaller, slightly lower quality) |
48
+
49
+ ### PyTorch Weights (Training & Inference)
50
+
51
+ **Generator Weights** (Main):
52
+ - `ColorizeArtistic_gen.pth` (243 MB)
53
+ - `ColorizeStable_gen.pth` (834 MB)
54
+ - `ColorizeVideo_gen.pth` (834 MB)
55
+
56
+ **Critic Weights** (Main):
57
+ - `ColorizeArtistic_crit.pth` (361 MB)
58
+ - `ColorizeStable_crit.pth` (361 MB)
59
+ - `ColorizeVideo_crit.pth` (361 MB)
60
+
61
+ **PretrainOnly Weights** (For continued training):
62
+ - `ColorizeArtistic_PretrainOnly_gen.pth` (729 MB)
63
+ - `ColorizeArtistic_PretrainOnly_crit.pth` (1.05 GB)
64
+ - `ColorizeStable_PretrainOnly_crit.pth` (1.05 GB)
65
+ - `ColorizeVideo_PretrainOnly_crit.pth` (1.05 GB)
66
+
67
+ > **Note**: Stable and Video PretrainOnly generators are split files hosted on [GitHub Releases](https://github.com/thookham/DeOldify/releases/tag/v2.0-models).
68
+
69
+ ## Usage
70
+
71
+ ### Browser (ONNX)
72
+
73
+ ```html
74
+ <!DOCTYPE html>
75
+ <html>
76
+ <head>
77
+ <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script>
78
+ </head>
79
+ <body>
80
+ <script>
81
+ async function colorize() {
82
+ // Load model from Hugging Face
83
+ const session = await ort.InferenceSession.create(
84
+ "https://huggingface.co/thookham/DeOldify/resolve/main/deoldify-art.onnx"
85
+ );
86
+
87
+ // Run inference (see full example in GitHub repo)
88
+ // ...
89
+ }
90
+ </script>
91
+ </body>
92
+ </html>
93
+ ```
94
+
95
+ ### PyTorch (Python)
96
+
97
+ ```python
98
+ from huggingface_hub import hf_hub_download
99
+ import torch
100
+
101
+ # Download model weights
102
+ model_path = hf_hub_download(
103
+ repo_id="thookham/DeOldify",
104
+ filename="ColorizeArtistic_gen.pth"
105
+ )
106
+
107
+ # Load weights (requires deoldify package installed)
108
+ # See GitHub repository for full usage examples
109
+ ```
110
+
111
+ ### Installation
112
+
113
+ ```bash
114
+ # Clone the main repository
115
+ git clone https://github.com/thookham/DeOldify
116
+ cd DeOldify
117
+
118
+ # Install dependencies
119
+ pip install -r requirements.txt
120
+
121
+ # Download a model
122
+ from huggingface_hub import hf_hub_download
123
+ model = hf_hub_download(repo_id="thookham/DeOldify", filename="ColorizeStable_gen.pth")
124
+ ```
125
+
126
+ ## Technical Details
127
+
128
+ ### Architecture
129
+ - **Generator**: U-Net with ResNet34/101 backbone, spectral normalization, self-attention layers
130
+ - **Critic**: PatchGAN discriminator
131
+ - **Loss**: Perceptual loss (VGG16) + GAN loss
132
+
133
+ ### NoGAN Training
134
+ A novel training approach that combines:
135
+ 1. Generator pretraining with feature loss
136
+ 2. Critic pretraining on generated images
137
+ 3. Short GAN training (30-60 minutes) at inflection point
138
+ 4. Optional cycle repeats for more colorful results
139
+
140
+ This eliminates typical GAN artifacts while maintaining realistic colorization.
141
+
142
+ ### Training Data
143
+ - Dataset: ImageNet subsets (1-32% depending on model)
144
+ - Resolution: 192px during training
145
+ - Augmentation: Gaussian noise for video stability
146
+
147
+ ## Model Card
148
+
149
+ ### Model Details
150
+ - **Developed by**: Jason Antic (original), Travis Hookham (modernization)
151
+ - **Model type**: Conditional GAN for image-to-image translation
152
+ - **Language(s)**: N/A (computer vision)
153
+ - **License**: MIT
154
+ - **Parent Model**: Based on FastAI U-Net and Self-Attention GAN papers
155
+
156
+ ### Intended Use
157
+ **Primary Use**: Colorizing black and white photographs and videos
158
+ **Out-of-Scope**: Real-time processing, guaranteed historical accuracy
159
+
160
+ ### Limitations
161
+ - Colors may not be historically accurate
162
+ - Performance degrades on very low quality/damaged images
163
+ - Artistic model may require render_factor tuning
164
+ - Video model trades some color vibrancy for consistency
165
+
166
+ ## Related Models & Resources
167
+
168
+ ### Similar Colorization Models on Hugging Face
169
+
170
+ **GAN-based Colorization:**
171
+ - [Hammad712/GAN-Colorization-Model](https://huggingface.co/Hammad712/GAN-Colorization-Model) - GAN model for grayscale to color transformation
172
+ - [jessicanono/filparty_colorization](https://huggingface.co/jessicanono/filparty_colorization) - ResNet-based model for historical photos
173
+
174
+ **Stable Diffusion-based:**
175
+ - [rsortino/ColorizeNet](https://huggingface.co/rsortino/ColorizeNet) - ControlNet adaptation of SD 2.1 for colorization
176
+ - [AlekseyCalvin/ColorizeTruer_KontextFluxVar6_BySAP](https://huggingface.co/AlekseyCalvin/ColorizeTruer_KontextFluxVar6_BySAP) - Advanced Flux-based colorization
177
+
178
+ **Interactive Demos (Spaces):**
179
+ - [aryadytm/Photo-Colorization](https://huggingface.co/spaces/aryadytm/Photo-Colorization)
180
+ - [Shashank009/Black-And-White-Image-Colorization](https://huggingface.co/spaces/Shashank009/Black-And-White-Image-Colorization)
181
+ - [CA611/Image-Colorization](https://huggingface.co/spaces/CA611/Image-Colorization)
182
+
183
+ ### Why Choose DeOldify?
184
+
185
+ DeOldify stands out for:
186
+ - **NoGAN Training**: Unique approach eliminating typical GAN artifacts
187
+ - **Specialized Models**: Three purpose-built models (Artistic, Stable, Video)
188
+ - **Video Support**: Flicker-free temporal consistency
189
+ - **Proven Track Record**: Powers MyHeritage InColor and widely adopted
190
+ - **ONNX Support**: Browser-ready models for offline use
191
+
192
+ ## Citation
193
+
194
+ If you use these models, please cite:
195
+
196
+ ```bibtex
197
+ @misc{deoldify,
198
+ author = {Antic, Jason},
199
+ title = {DeOldify},
200
+ year = {2019},
201
+ publisher = {GitHub},
202
+ url = {https://github.com/jantic/DeOldify}
203
+ }
204
+ ```
205
+
206
+ ## Links
207
+
208
+ - **GitHub Repository**: https://github.com/thookham/DeOldify
209
+ - **Original DeOldify**: https://github.com/jantic/DeOldify
210
+ - **MyHeritage InColor** (Commercial version): https://www.myheritage.com/incolor
211
+ - **Demo (Browser)**: See browser/ folder in GitHub repo
212
+
213
+ ## License
214
+
215
+ MIT License. See [LICENSE](https://github.com/thookham/DeOldify/blob/master/LICENSE) file.