bruefire
commited on
Commit
·
f66431c
1
Parent(s):
77e93e3
added a draft for workflow.md.
Browse files- workflow.md +69 -0
workflow.md
ADDED
|
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Workflow for fine-tuning ModelScope in anime style
|
| 2 |
+
Here is a brief description of my process for fine-tuning ModelScope in an animated style.
|
| 3 |
+
Most of it may be basic, but I hope it will be useful.
|
| 4 |
+
There is no guarantee that what is written here is correct and will lead to good results!
|
| 5 |
+
|
| 6 |
+
## Selection of training data
|
| 7 |
+
The goal of my training was to change the model to an overall anime style.
|
| 8 |
+
Only the art style was to override the ModelScope content, so I did not need a huge data set.
|
| 9 |
+
The total number of videos and images was only a few thousand.
|
| 10 |
+
Most of the video was taken from Tenor.
|
| 11 |
+
Many of the videos were posted as gifs and mp4s of one short scene.
|
| 12 |
+
It seems to be possible to automate the process using the API.
|
| 13 |
+
https://tenor.com/
|
| 14 |
+
I also used some smooth and stable motions and videos of 3d models with toon shading.
|
| 15 |
+
Short videos are sufficient, as we are not able to study very long data at this time.
|
| 16 |
+
|
| 17 |
+
### Notes on data collection
|
| 18 |
+
Blurring and noise are also trained. This is especially noticeable in the case of high-resolution training.
|
| 19 |
+
Frame rate also has an effect. If you want to train smooth motion, you need such data.
|
| 20 |
+
Scene switching also has an effect. If not addressed, the character may suddenly transform.
|
| 21 |
+
In the case of animation training, it is difficult to express details if only video sources are used, so images are also used for training.
|
| 22 |
+
Images can be created using StableDiffusion.
|
| 23 |
+
The fewer the differences between frames, the less likely the training results will be corrupted.
|
| 24 |
+
I avoided animations with too much dynamic motion.
|
| 25 |
+
It may be better to avoid scenes with multiple contexts and choose scenes with simple actions.
|
| 26 |
+
I collected data while checking if common emotions and actions were included.
|
| 27 |
+
|
| 28 |
+
## Correcting data before training
|
| 29 |
+
|
| 30 |
+
### Fixing resolution, burnout, and noise
|
| 31 |
+
It is safe to use a resolution at least equal to or higher than the training resolution.
|
| 32 |
+
The ratio should also match the training settings.
|
| 33 |
+
Trimming is possible with ffmpeg.
|
| 34 |
+
Incidentally, I have tried padding to ratio with a single color instead of trimming, but it seemed to decrease the training speed.
|
| 35 |
+
|
| 36 |
+
### Converting small videos to larger sizes
|
| 37 |
+
I used this tool: https://github.com/k4yt3x/video2x
|
| 38 |
+
The recommended driver is Waifu2XCaffe. It is suitable for animation as it gets clear and sharp results. It also reduces noise a little.
|
| 39 |
+
If you cannot improve the image quality as well as the resolution, it may be better not to force a higher resolution.
|
| 40 |
+
|
| 41 |
+
### Number of frames
|
| 42 |
+
Since many animations have a small number of frames, the results of the training are likely to be collapsed.
|
| 43 |
+
In addition to body collapse, the appearance of the character will no longer be consistent. Less variation between frames seems to improve consistency.
|
| 44 |
+
The following tool may be useful for frame interpolation
|
| 45 |
+
https://github.com/google-research/frame-interpolation
|
| 46 |
+
If the variation between frames is too large, you will not get a clean result.
|
| 47 |
+
|
| 48 |
+
## Tagging
|
| 49 |
+
For anime, WaifuTagger can extract content with good accuracy, so I created a slightly modified script for video and used it for animov512x.
|
| 50 |
+
https://github.com/bruefire/WaifuTaggerForVideo
|
| 51 |
+
Nevertheless, Blip2-Preprocessor can also extract enough general scene content. It may be a better idea to use them together.
|
| 52 |
+
https://github.com/ExponentialML/Video-BLIP2-Preprocessor
|
| 53 |
+
|
| 54 |
+
## Configuration settings
|
| 55 |
+
todo
|
| 56 |
+
|
| 57 |
+
## Evaluate training results
|
| 58 |
+
If there are any poorly done results in the sample videos being trained, we will search the json with the prompts for that sample. With a training dataset of a few thousand or so, you can usually find the training source videos, which may be helpful to see where the problem lies.
|
| 59 |
+
I dared to train all videos with 'anime' tags.
|
| 60 |
+
Comparing videos with the positive prompts and negative ones with anime tag after training (comparing a fine-tuned model with those that are similar to the original ModelScope) may help improve training.
|
| 61 |
+
|
| 62 |
+
It is difficult to add additional training to specific things afterwards, even if they are tagged, so I avoided that.
|
| 63 |
+
Note that the number of frames in anime is small to begin with, so over-learning tends to freeze the characters.
|
| 64 |
+
|
| 65 |
+
Perhaps it is because ModelScope itself is not trained at such a large resolution, but the training difficulty seems to be lower at lower resolutions.
|
| 66 |
+
In fact, when training Animov-0.1, I did not need to pay much attention to what is written here to get good results.
|
| 67 |
+
If you are fine-tuning ModelScope at larger resolutions, you may need to train incrementally with more data to avoid collapsing the results.
|
| 68 |
+
|
| 69 |
+
That's all.
|