chmielvu commited on
Commit
2285363
·
verified ·
1 Parent(s): 980ab55

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. Dockerfile +20 -0
  2. README.md +42 -4
  3. app.py +252 -0
  4. requirements.txt +5 -0
Dockerfile ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ RUN apt-get update && apt-get install -y --no-install-recommends \
6
+ libopenblas-dev \
7
+ libgomp1 \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ RUN pip install --no-cache-dir \
11
+ https://huggingface.co/Luigi/llama-cpp-python-wheels-hf-spaces-free-cpu/resolve/main/llama_cpp_python-0.3.22-cp310-cp310-linux_x86_64.whl
12
+
13
+ COPY requirements.txt .
14
+ RUN pip install --no-cache-dir -r requirements.txt
15
+
16
+ COPY app.py .
17
+
18
+ EXPOSE 7860
19
+
20
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,10 +1,48 @@
1
  ---
2
  title: Prompt Generator
3
- emoji: 💻
4
- colorFrom: indigo
5
- colorTo: pink
6
  sdk: docker
7
  pinned: false
 
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Prompt Generator
3
+ emoji:
4
+ colorFrom: yellow
5
+ colorTo: green
6
  sdk: docker
7
  pinned: false
8
+ license: other
9
+ preload_from_hub:
10
+ - mradermacher/Promt-generator-GGUF
11
  ---
12
 
13
+ # Prompt Generator (Q4_K_M)
14
+
15
+ A 600M parameter Bloom-based model trained for creative prompt generation. Give it a short concept and it will generate detailed, creative prompts for image generation or creative writing.
16
+
17
+ ## Features
18
+
19
+ - **Creative Prompt Generation**: Expand short ideas into detailed prompts
20
+ - **Image Prompt Creator**: Generate prompts for AI image generators
21
+ - **Completion Model**: Continues your text rather than responding
22
+ - **Lightweight**: Only 600M parameters, runs on CPU
23
+
24
+ ## Model Details
25
+
26
+ - **Base**: UnfilteredAI/Promt-generator
27
+ - **Architecture**: Bloom
28
+ - **GGUF by**: mradermacher/Promt-generator-GGUF
29
+ - **Quantization**: Q4_K_M (561 MB)
30
+ - **Type**: Base completion model (not instruct-tuned)
31
+
32
+ ## API Endpoint
33
+
34
+ - `POST /v1/completions` - Text completions (OpenAI-style, supports streaming)
35
+
36
+ ## Usage
37
+
38
+ ```bash
39
+ curl -X POST "https://YOUR_SPACE.hf.space/v1/completions" \
40
+ -H "Content-Type: application/json" \
41
+ -d '{"prompt": "a mysterious castle on", "max_tokens": 100}'
42
+ ```
43
+
44
+ ## Tech Stack
45
+
46
+ - llama.cpp via JamePeng fork (Luigi wheel v0.3.22)
47
+ - Model: Promt-generator (Q4_K_M)
48
+ - Completion API (not chat - this is a base model)
app.py ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import threading
3
+ import time
4
+ import uuid
5
+ from functools import lru_cache
6
+ from typing import Any, Dict, Iterable
7
+
8
+ import gradio as gr
9
+ from fastapi import FastAPI, Request
10
+ from fastapi.responses import JSONResponse, StreamingResponse
11
+ from huggingface_hub import hf_hub_download
12
+ from llama_cpp import Llama
13
+
14
+ # Model configuration - hardcoded
15
+ MODEL_REPO_ID = "mradermacher/Promt-generator-GGUF"
16
+ MODEL_FILE = "Promt-generator.Q4_K_M.gguf"
17
+ # No chat format - this is a base completion model, not instruct-tuned
18
+
19
+ # llama.cpp settings optimized for HF Spaces free tier
20
+ N_CTX = 2048
21
+ N_THREADS = 2
22
+ N_BATCH = 512
23
+ USE_MMAP = True
24
+
25
+ LOCK = threading.Lock()
26
+ api = FastAPI()
27
+
28
+
29
+ def _now() -> int:
30
+ return int(time.time())
31
+
32
+
33
+ def _openai_id(prefix: str) -> str:
34
+ return f"{prefix}-{uuid.uuid4().hex[:24]}"
35
+
36
+
37
+ def _sse(obj: Any) -> str:
38
+ return f"data: {json.dumps(obj, ensure_ascii=True)}\n\n"
39
+
40
+
41
+ def _sse_done() -> str:
42
+ return "data: [DONE]\n\n"
43
+
44
+
45
+ @lru_cache(maxsize=1)
46
+ def _get_llm_and_path() -> Dict[str, Any]:
47
+ model_path = hf_hub_download(
48
+ repo_id=MODEL_REPO_ID, filename=MODEL_FILE, repo_type="model"
49
+ )
50
+
51
+ llm = Llama(
52
+ model_path=model_path,
53
+ n_ctx=N_CTX,
54
+ n_threads=N_THREADS,
55
+ n_batch=N_BATCH,
56
+ n_gpu_layers=0,
57
+ verbose=False,
58
+ use_mmap=USE_MMAP,
59
+ )
60
+ return {"llm": llm, "model_path": model_path}
61
+
62
+
63
+ @api.get("/health")
64
+ def health() -> Dict[str, Any]:
65
+ loaded = _get_llm_and_path.cache_info().currsize > 0
66
+ return {
67
+ "status": "ok",
68
+ "backend": "llama.cpp",
69
+ "loaded": loaded,
70
+ "model_repo_id": MODEL_REPO_ID,
71
+ "model_file": MODEL_FILE,
72
+ "chat_format": None,
73
+ }
74
+
75
+
76
+ @api.get("/ready")
77
+ def ready() -> Dict[str, Any]:
78
+ m = _get_llm_and_path()
79
+ llm: Llama = m["llm"]
80
+ with LOCK:
81
+ llm("OK", max_tokens=1, temperature=0.0)
82
+ return {"status": "ok", "loaded": True}
83
+
84
+
85
+ @api.get("/v1/models")
86
+ def v1_models() -> Dict[str, Any]:
87
+ model_name = f"{MODEL_REPO_ID}/{MODEL_FILE}"
88
+ return {"object": "list", "data": [{"id": model_name, "object": "model"}]}
89
+
90
+
91
+ @api.post("/v1/completions")
92
+ async def completions(req: Request):
93
+ """OpenAI-style completions endpoint for this base model."""
94
+ payload = await req.json()
95
+ prompt = payload.get("prompt") or ""
96
+ stream = bool(payload.get("stream") or False)
97
+ max_tokens = int(payload.get("max_tokens") or 128)
98
+ temperature = float(payload.get("temperature") or 0.7)
99
+ top_p = float(payload.get("top_p") or 0.95)
100
+
101
+ if not prompt:
102
+ return JSONResponse(
103
+ status_code=400,
104
+ content={"error": {"message": "prompt must be non-empty"}},
105
+ )
106
+
107
+ m = _get_llm_and_path()
108
+ llm: Llama = m["llm"]
109
+ created = _now()
110
+ resp_id = _openai_id("cmpl")
111
+ model_name = f"{MODEL_REPO_ID}/{MODEL_FILE}"
112
+
113
+ if not stream:
114
+ with LOCK:
115
+ out = llm(
116
+ prompt=prompt,
117
+ max_tokens=max_tokens,
118
+ temperature=temperature,
119
+ top_p=top_p,
120
+ stream=False,
121
+ )
122
+ return {
123
+ "id": resp_id,
124
+ "object": "text_completion",
125
+ "created": created,
126
+ "model": model_name,
127
+ "choices": [
128
+ {
129
+ "text": out["choices"][0]["text"],
130
+ "index": 0,
131
+ "finish_reason": out["choices"][0].get("finish_reason", "stop"),
132
+ }
133
+ ],
134
+ }
135
+
136
+ def gen() -> Iterable[str]:
137
+ with LOCK:
138
+ it = llm(
139
+ prompt=prompt,
140
+ max_tokens=max_tokens,
141
+ temperature=temperature,
142
+ top_p=top_p,
143
+ stream=True,
144
+ )
145
+ for chunk in it:
146
+ yield _sse({
147
+ "id": resp_id,
148
+ "object": "text_completion",
149
+ "created": created,
150
+ "model": model_name,
151
+ "choices": [
152
+ {
153
+ "text": chunk["choices"][0].get("text", ""),
154
+ "index": 0,
155
+ "finish_reason": chunk["choices"][0].get("finish_reason"),
156
+ }
157
+ ],
158
+ })
159
+ yield _sse_done()
160
+
161
+ return StreamingResponse(gen(), media_type="text/event-stream")
162
+
163
+
164
+ def _ui_generate(
165
+ prompt: str,
166
+ max_tokens: int,
167
+ temperature: float,
168
+ top_p: float,
169
+ ) -> str:
170
+ """Generate text completion for the UI."""
171
+ m = _get_llm_and_path()
172
+ llm: Llama = m["llm"]
173
+ with LOCK:
174
+ out = llm(
175
+ prompt=prompt,
176
+ max_tokens=max_tokens,
177
+ temperature=temperature,
178
+ top_p=top_p,
179
+ stream=False,
180
+ )
181
+ return out["choices"][0]["text"]
182
+
183
+
184
+ DESCRIPTION = """
185
+ ### Prompt Generator (Q4_K_M, CPU)
186
+
187
+ A 600M parameter Bloom-based model trained for creative prompt generation. Give it a short concept or idea, and it will generate detailed, creative prompts for image generation or other creative tasks.
188
+
189
+ **Note:** This is a **completion model** (not chat), so it continues your text rather than responding to it.
190
+
191
+ **API Endpoint:**
192
+ - `POST /v1/completions` - Text completions (supports streaming)
193
+
194
+ **Best for:** Generating creative prompts, expanding ideas, image prompt creation
195
+ """
196
+
197
+ with gr.Blocks(title="Prompt Generator", theme=gr.themes.Soft()) as demo:
198
+ gr.Markdown(DESCRIPTION)
199
+
200
+ with gr.Row():
201
+ with gr.Column():
202
+ input_text = gr.Textbox(
203
+ label="Start your prompt",
204
+ placeholder="a beautiful sunset over...",
205
+ lines=3,
206
+ info="Enter a short concept or beginning of a prompt",
207
+ )
208
+ with gr.Row():
209
+ max_tokens = gr.Slider(
210
+ minimum=32, maximum=512, value=128, step=32, label="Max tokens"
211
+ )
212
+ temperature = gr.Slider(
213
+ minimum=0.1, maximum=2.0, value=0.9, step=0.1, label="Temperature"
214
+ )
215
+ top_p = gr.Slider(
216
+ minimum=0.1, maximum=1.0, value=0.95, step=0.05, label="Top-p"
217
+ )
218
+ generate_btn = gr.Button("Generate", variant="primary")
219
+
220
+ with gr.Column():
221
+ output_text = gr.Textbox(
222
+ label="Generated prompt",
223
+ lines=8,
224
+ interactive=False,
225
+ )
226
+
227
+ examples = gr.Examples(
228
+ examples=[
229
+ ["a mysterious forest with"],
230
+ ["a futuristic city at night"],
231
+ ["an enchanted garden filled with"],
232
+ ["a steampunk airship flying over"],
233
+ ["a cozy coffee shop on a rainy day"],
234
+ ],
235
+ inputs=input_text,
236
+ label="Examples (click to try)",
237
+ )
238
+
239
+ generate_btn.click(
240
+ fn=_ui_generate,
241
+ inputs=[input_text, max_tokens, temperature, top_p],
242
+ outputs=output_text,
243
+ )
244
+
245
+
246
+ app = gr.mount_gradio_app(api, demo, path="/")
247
+
248
+
249
+ if __name__ == "__main__":
250
+ import uvicorn
251
+
252
+ uvicorn.run(app, host="0.0.0.0", port=7860)
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ fastapi>=0.115.0
3
+ uvicorn[standard]>=0.30.0
4
+ huggingface_hub>=0.26.0
5
+ numpy