ChuuniZ commited on
Commit
8870709
·
verified ·
1 Parent(s): b9a67f8

Upload Joy_caption/README.md

Browse files
Files changed (1) hide show
  1. Joy_caption/README.md +79 -0
Joy_caption/README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ ---
6
+ # Image Captioning App
7
+
8
+ This is a mod of [Wi-zz/joy-caption-pre-alpha](https://huggingface.co/Wi-zz/joy-caption-pre-alpha) and [fancyfeast/joy-caption-alpha-two](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two). Thanks to [dominic1021](https://huggingface.co/dominic1021), [IceHibiki](https://huggingface.co/IceHibiki), [BullseyeMxP](https://huggingface.co/BullseyeMxP), [Wakeme](https://huggingface.co/Wakeme).
9
+
10
+ # Notice: I will contribute to Wi-zz after shaping the code.
11
+
12
+ ## Overview
13
+
14
+ This application generates descriptive captions for images using advanced ML models. It processes single images or entire directories, leveraging CLIP and LLM models for accurate and contextual captions. It has NSFW captioning support with natural language. This is just an extension of the original author's efforts to improve performance. Their repo is located here: https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two.
15
+
16
+ ## Features
17
+
18
+ - Single image and batch processing
19
+ - Multiple directory support
20
+ - Custom output directory
21
+ - Adjustable batch size
22
+ - Progress tracking
23
+
24
+ ## Usage
25
+
26
+ | Command | Description |
27
+ |---------|-------------|
28
+ | `python app.py image.jpg` | Process a single image |
29
+ | `python app.py /path/to/directory` | Process all images in a directory |
30
+ | `python app.py /path/to/dir1 /path/to/dir2` | Process multiple directories |
31
+ | `python app.py /path/to/dir --output /path/to/output` | Specify output directory |
32
+ | `python app.py /path/to/dir --bs 8` | Set batch size (default: 4) |
33
+
34
+ ## Technical Details
35
+
36
+ - **Models**: CLIP (vision), LLM (language), custom ImageAdapter
37
+ - **Optimization**: CUDA-enabled GPU support
38
+ - **Error Handling**: Skips problematic images in batch processing
39
+
40
+ ## Requirements
41
+
42
+ - Python 3.x
43
+ - PyTorch
44
+ - Transformers library
45
+ - PEFT library
46
+ - CUDA-capable GPU (recommended)
47
+
48
+ ## Installation
49
+
50
+ Windows
51
+
52
+ ```bash
53
+ git clone https://huggingface.co/John6666/joy-caption-alpha-two-cli-mod
54
+ cd joy-caption-alpha-two-cli-mod
55
+ python -m venv venv
56
+ .\venv\Scripts\activate
57
+ # Change as per https://pytorch.org/get-started/locally/
58
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
59
+ pip install -r requirements.txt
60
+ ```
61
+
62
+ Linux
63
+
64
+ ```bash
65
+ git clone https://huggingface.co/John6666/joy-caption-alpha-two-cli-mod
66
+ cd joy-caption-alpha-two-cli-mod
67
+ python3 -m venv venv
68
+ source venv/bin/activate
69
+ pip3 install torch torchvision torchaudio
70
+ pip3 install -r requirements.txt
71
+ ```
72
+
73
+ ## Contributing
74
+
75
+ Contributions are welcome! Please feel free to submit a Pull Request.
76
+
77
+ ## License
78
+
79
+ This project is licensed under the [MIT License](LICENSE).