Anshu13 commited on
Commit
61e8692
ยท
verified ยท
1 Parent(s): ce28c25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +170 -10
README.md CHANGED
@@ -1,13 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Prompt Engine
3
- emoji: ๐Ÿ“‰
4
- colorFrom: indigo
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 6.9.0
8
- app_file: app.py
9
- pinned: false
10
- short_description: Generate prompt easily for you Image, audio and text inputs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ๐Ÿง  Prompt Engine
2
+
3
+ A powerful AI-based system that converts **text, image, and audio inputs** into **high-quality, structured prompts** for generative AI models like Stable Diffusion, Midjourney, and DALLยทE.
4
+
5
+ ---
6
+
7
+ ## ๐Ÿš€ Features
8
+
9
+ * โœ๏ธ **Text โ†’ Prompt**
10
+ Refines and extends simple prompts into detailed, high-quality prompts.
11
+
12
+ * ๐Ÿ–ผ๏ธ **Image + Text โ†’ Prompt**
13
+ Understands an image and user intent to generate a descriptive prompt.
14
+
15
+ * ๐ŸŽง **Audio โ†’ Prompt**
16
+ Converts speech into text and then generates a refined prompt.
17
+
18
+ * ๐Ÿง  **Multimodal AI (Janus-Pro-1B)**
19
+ Uses a vision-language model for intelligent prompt generation.
20
+
21
+ * ๐ŸŽจ **Gradio UI**
22
+ Interactive web interface for easy usage.
23
+
24
+ ---
25
+
26
+ ## ๐Ÿงฉ Architecture
27
+
28
+ ```
29
+ Input (Text / Image / Audio)
30
+ โ†“
31
+ Preprocessing Layer
32
+ (Whisper for audio)
33
+ โ†“
34
+ Instruction Builder (Prompt Engineering)
35
+ โ†“
36
+ Janus-Pro-1B Model
37
+ โ†“
38
+ Post-processing (clean output)
39
+ โ†“
40
+ Final AI Prompt
41
+ ```
42
+
43
+ ---
44
+
45
+ ## ๐Ÿ› ๏ธ Tech Stack
46
+
47
+ * **Python**
48
+ * **HuggingFace Transformers**
49
+ * **DeepSeek Janus-Pro-1B**
50
+ * **OpenAI Whisper (Speech-to-Text)**
51
+ * **Gradio (UI)**
52
+ * **PyTorch**
53
+
54
+ ---
55
+
56
+ ## ๐Ÿ“ฆ Installation
57
+
58
+ ### 1. Clone the repository
59
+
60
+ ```bash
61
+ git clone https://github.com/your-username/prompt-generator.git
62
+ cd prompt-generator
63
+ ```
64
+
65
+ ---
66
+
67
+ ### 2. Install dependencies
68
+
69
+ ```bash
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ ---
74
+
75
+ ### 3. Run the application
76
+
77
+ ```bash
78
+ python app.py
79
+ ```
80
+
81
+ ---
82
+
83
+ ## ๐Ÿงช Usage
84
+
85
+ 1. Open the Gradio UI in your browser
86
+ 2. Select input type:
87
+
88
+ * Text
89
+ * Image + Text
90
+ * Audio
91
+ 3. Provide input
92
+ 4. Click **Generate Prompt ๐Ÿš€**
93
+ 5. Get your refined AI prompt
94
+
95
  ---
96
+
97
+ ## ๐Ÿง  Example
98
+
99
+ ### Input:
100
+
101
+ ```
102
+ boy in forest
103
+ ```
104
+
105
+ ### Output:
106
+
107
+ ```
108
+ A cinematic scene of a young boy standing in a dense forest, soft sunlight filtering through tall trees, atmospheric fog, ultra-detailed, 4k, depth of field, masterpiece
109
+ ```
110
+
111
+ ---
112
+
113
+ ## ๐Ÿ“ Project Structure
114
+
115
+ ```
116
+ project/
117
+ โ”‚
118
+ โ”œโ”€โ”€ app.py
119
+ โ”œโ”€โ”€ requirements.txt
120
+ โ””โ”€โ”€ README.md
121
+ ```
122
+
123
+ ---
124
+
125
+ ## โš™๏ธ Core Functions
126
+
127
+ * `text_to_prompt()`
128
+ * `image_text_to_prompt()`
129
+ * `audio_to_prompt()`
130
+ * `generate_universal_prompt()`
131
+
132
+ ---
133
+
134
+ ## โš ๏ธ Limitations
135
+
136
+ * Requires GPU for best performance
137
+ * Video input not supported (yet)
138
+ * Output quality depends on prompt instruction
139
+
140
  ---
141
 
142
+ ## ๐Ÿ”ฎ Future Improvements
143
+
144
+ * ๐ŸŽฅ Video input support
145
+ * ๐ŸŽจ Style selection (anime, cinematic, realistic)
146
+ * ๐Ÿ“Š Prompt scoring system
147
+ * โ˜๏ธ Deployment on HuggingFace Spaces
148
+
149
+ ---
150
+
151
+ ## ๐Ÿค Contributing
152
+
153
+ Pull requests are welcome!
154
+ For major changes, please open an issue first.
155
+
156
+ ---
157
+
158
+ ## ๐Ÿ“œ License
159
+
160
+ This project is open-source under the MIT License.
161
+
162
+ ---
163
+
164
+ ## ๐Ÿ‘จโ€๐Ÿ’ป Author
165
+
166
+ **Anshu Singh**
167
+
168
+ ---
169
+
170
+ ## โญ If you like this project
171
+
172
+ Give it a โญ on GitHub!
173
+