lipeiying commited on
Commit
81b0750
ยท
verified ยท
1 Parent(s): 9b647e6

Auto-sync from GitHub

Browse files
Files changed (4) hide show
  1. Dockerfile +27 -0
  2. README.md +20 -6
  3. app.py +485 -0
  4. requirements.txt +25 -0
Dockerfile ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies for PDF, image processing
6
+ RUN apt-get update && apt-get install -y \
7
+ poppler-utils \
8
+ libgl1-mesa-glx \
9
+ libglib2.0-0 \
10
+ libsm6 \
11
+ libxext6 \
12
+ libxrender-dev \
13
+ wget \
14
+ && rm -rf /var/lib/apt/lists/*
15
+
16
+ # Copy requirements first for cache efficiency
17
+ COPY requirements.txt .
18
+ RUN pip install --no-cache-dir -r requirements.txt
19
+
20
+ # Copy app
21
+ COPY app.py .
22
+
23
+ # Expose port
24
+ EXPOSE 7860
25
+
26
+ # Run
27
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,11 +1,25 @@
1
  ---
2
- title: GLM OCR
3
- emoji: โšก
4
- colorFrom: yellow
5
- colorTo: gray
6
  sdk: docker
7
  pinned: false
8
- short_description: OCR ๆจกๅž‹
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: GLM-OCR API
3
+ emoji: ๐Ÿ”
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
  pinned: false
 
8
  ---
9
 
10
+ # GLM-OCR OpenAI Compatible API
11
+
12
+ This Space runs [zai-org/GLM-OCR](https://huggingface.co/zai-org/GLM-OCR) and exposes an OpenAI-compatible REST API.
13
+
14
+ ## Usage
15
+
16
+ - Base URL: `https://YOUR_USERNAME-glm-ocr-api.hf.space`
17
+ - API Key: Set in Space Secrets as `API_KEY`
18
+ - Model: `glm-ocr`
19
+
20
+ ## Chatbox Config
21
+
22
+ 1. Settings โ†’ Custom API
23
+ 2. API URL: `https://YOUR_USERNAME-glm-ocr-api.hf.space`
24
+ 3. API Key: your secret key
25
+ 4. Model: `glm-ocr`
app.py ADDED
@@ -0,0 +1,485 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ # -*- coding: utf-8 -*-
3
+ """
4
+ GLM-OCR OpenAI Compatible API Server
5
+ HuggingFace Space ๅ…่ดน้ƒจ็ฝฒ็‰ˆ
6
+ ๆ”ฏๆŒ Chatbox ็ญ‰ๅฎขๆˆท็ซฏ็›ดๆŽฅๆŽฅๅ…ฅ
7
+ ไฝœ่€…: GLM-OCR Deploy Script
8
+ """
9
+
10
+ import os
11
+ import io
12
+ import sys
13
+ import json
14
+ import time
15
+ import base64
16
+ import traceback
17
+ import mimetypes
18
+ import zipfile
19
+ from pathlib import Path
20
+ from typing import Optional, List, Union
21
+ from contextlib import asynccontextmanager
22
+
23
+ from fastapi import FastAPI, HTTPException, Depends, Request
24
+ from fastapi.responses import JSONResponse, StreamingResponse
25
+ from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
26
+ from pydantic import BaseModel
27
+ import uvicorn
28
+ from PIL import Image
29
+ import requests
30
+
31
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ้…็ฝฎ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
32
+ MODEL_NAME = "zai-org/GLM-OCR"
33
+ MODEL_ALIAS = "glm-ocr"
34
+ API_KEY = os.environ.get("API_KEY", "") # ไปŽ HF Space Secrets ่ฏปๅ–
35
+ PORT = int(os.environ.get("PORT", 7860))
36
+
37
+ print(f"[STARTUP] GLM-OCR API Server v1.0")
38
+ print(f"[STARTUP] Model: {MODEL_NAME}")
39
+ print(f"[STARTUP] Port: {PORT}")
40
+ print(f"[STARTUP] API Key protection: {'ENABLED' if API_KEY else 'DISABLED (set API_KEY secret!)'}")
41
+
42
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๅ…จๅฑ€ๆจกๅž‹ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
43
+ _processor = None
44
+ _model = None
45
+
46
+ def load_model():
47
+ global _processor, _model
48
+ try:
49
+ print("[MODEL] Loading transformers...")
50
+ import torch
51
+ from transformers import AutoProcessor, AutoModelForImageTextToText
52
+
53
+ print("[MODEL] Downloading/Loading AutoProcessor...")
54
+ _processor = AutoProcessor.from_pretrained(MODEL_NAME)
55
+
56
+ print("[MODEL] Downloading/Loading AutoModelForImageTextToText...")
57
+ _model = AutoModelForImageTextToText.from_pretrained(
58
+ pretrained_model_name_or_path=MODEL_NAME,
59
+ torch_dtype="auto",
60
+ device_map="auto",
61
+ )
62
+ device = next(_model.parameters()).device
63
+ print(f"[MODEL] Model loaded OK on device: {device}")
64
+ except Exception:
65
+ print("[MODEL][FATAL] Failed to load model:")
66
+ traceback.print_exc()
67
+ sys.exit(1)
68
+
69
+ @asynccontextmanager
70
+ async def lifespan(app: FastAPI):
71
+ load_model()
72
+ yield
73
+
74
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ FastAPI โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
75
+ app = FastAPI(
76
+ title="GLM-OCR OpenAI Compatible API",
77
+ version="1.0.0",
78
+ lifespan=lifespan,
79
+ )
80
+ security = HTTPBearer(auto_error=False)
81
+
82
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ้‰ดๆƒ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
83
+ def verify_api_key(credentials: Optional[HTTPAuthorizationCredentials] = Depends(security)):
84
+ if not API_KEY:
85
+ return True # ๆœช้…็ฝฎ secret ๆ—ถ่ทณ่ฟ‡
86
+ if credentials is None:
87
+ raise HTTPException(
88
+ status_code=401,
89
+ detail="Missing API Key. Add header: Authorization: Bearer YOUR_API_KEY"
90
+ )
91
+ if credentials.credentials != API_KEY:
92
+ raise HTTPException(status_code=401, detail="Invalid API Key")
93
+ return True
94
+
95
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Pydantic ๆ•ฐๆฎๆจกๅž‹ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
96
+ class ImageUrlObj(BaseModel):
97
+ url: str
98
+ detail: Optional[str] = "auto"
99
+
100
+ class ContentPart(BaseModel):
101
+ type: str
102
+ text: Optional[str] = None
103
+ image_url: Optional[ImageUrlObj] = None
104
+
105
+ class Message(BaseModel):
106
+ role: str
107
+ content: Union[str, List[ContentPart]]
108
+
109
+ class ChatRequest(BaseModel):
110
+ model: Optional[str] = MODEL_ALIAS
111
+ messages: List[Message]
112
+ max_tokens: Optional[int] = 8192
113
+ temperature: Optional[float] = 0.1
114
+ stream: Optional[bool] = False
115
+
116
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๆ–‡ไปถๅค„็†ๅทฅๅ…ท โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
117
+
118
+ def b64_to_image(data_uri: str) -> Image.Image:
119
+ """base64 data URI โ†’ PIL Image"""
120
+ try:
121
+ data = data_uri.split(",", 1)[1] if "," in data_uri else data_uri
122
+ return Image.open(io.BytesIO(base64.b64decode(data))).convert("RGB")
123
+ except Exception:
124
+ print("[FILE][ERROR] base64 decode failed:")
125
+ traceback.print_exc()
126
+ raise
127
+
128
+ def url_to_image(url: str) -> Image.Image:
129
+ """URL โ†’ PIL Image"""
130
+ try:
131
+ print(f"[FILE] Downloading image: {url[:80]}")
132
+ r = requests.get(url, timeout=30, headers={"User-Agent": "GLM-OCR/1.0"})
133
+ r.raise_for_status()
134
+ return Image.open(io.BytesIO(r.content)).convert("RGB")
135
+ except Exception:
136
+ print("[FILE][ERROR] URL image download failed:")
137
+ traceback.print_exc()
138
+ raise
139
+
140
+ def pdf_to_images(pdf_bytes: bytes) -> List[Image.Image]:
141
+ """PDF โ†’ List[PIL Image]"""
142
+ try:
143
+ from pdf2image import convert_from_bytes
144
+ imgs = convert_from_bytes(pdf_bytes, dpi=150)
145
+ print(f"[FILE] PDF converted: {len(imgs)} pages")
146
+ return imgs
147
+ except ImportError:
148
+ print("[FILE][WARN] pdf2image not installed, skipping PDF")
149
+ return []
150
+ except Exception:
151
+ print("[FILE][ERROR] PDF processing failed:")
152
+ traceback.print_exc()
153
+ return []
154
+
155
+ def docx_to_content(docx_bytes: bytes):
156
+ """DOCX โ†’ (text_str, [PIL Image])"""
157
+ try:
158
+ import docx as python_docx
159
+ doc = python_docx.Document(io.BytesIO(docx_bytes))
160
+ texts = [p.text for p in doc.paragraphs if p.text.strip()]
161
+ images = []
162
+ for rel in doc.part.rels.values():
163
+ if "image" in rel.reltype:
164
+ try:
165
+ blob = rel.target_part.blob
166
+ images.append(Image.open(io.BytesIO(blob)).convert("RGB"))
167
+ except Exception:
168
+ pass
169
+ return "\n".join(texts), images
170
+ except ImportError:
171
+ print("[FILE][WARN] python-docx not installed")
172
+ return "", []
173
+ except Exception:
174
+ print("[FILE][ERROR] DOCX processing failed:")
175
+ traceback.print_exc()
176
+ return "", []
177
+
178
+ def xlsx_to_text(xlsx_bytes: bytes) -> str:
179
+ """XLSX โ†’ plain text table"""
180
+ try:
181
+ import openpyxl
182
+ wb = openpyxl.load_workbook(io.BytesIO(xlsx_bytes), read_only=True)
183
+ lines = []
184
+ for name in wb.sheetnames:
185
+ lines.append(f"=== Sheet: {name} ===")
186
+ for row in wb[name].iter_rows(values_only=True):
187
+ row_str = "\t".join("" if c is None else str(c) for c in row)
188
+ if row_str.strip():
189
+ lines.append(row_str)
190
+ return "\n".join(lines)
191
+ except ImportError:
192
+ print("[FILE][WARN] openpyxl not installed")
193
+ return ""
194
+ except Exception:
195
+ print("[FILE][ERROR] XLSX processing failed:")
196
+ traceback.print_exc()
197
+ return ""
198
+
199
+ def pptx_to_text(pptx_bytes: bytes) -> str:
200
+ """PPTX โ†’ plain text"""
201
+ try:
202
+ from pptx import Presentation
203
+ prs = Presentation(io.BytesIO(pptx_bytes))
204
+ lines = []
205
+ for i, slide in enumerate(prs.slides, 1):
206
+ lines.append(f"=== Slide {i} ===")
207
+ for shape in slide.shapes:
208
+ if hasattr(shape, "text") and shape.text.strip():
209
+ lines.append(shape.text)
210
+ return "\n".join(lines)
211
+ except ImportError:
212
+ print("[FILE][WARN] python-pptx not installed")
213
+ return ""
214
+ except Exception:
215
+ print("[FILE][ERROR] PPTX processing failed:")
216
+ traceback.print_exc()
217
+ return ""
218
+
219
+ def zip_to_text(zip_bytes: bytes) -> str:
220
+ """ZIP โ†’ extract text from supported files inside"""
221
+ try:
222
+ parts = []
223
+ with zipfile.ZipFile(io.BytesIO(zip_bytes)) as zf:
224
+ for name in zf.namelist():
225
+ ext = Path(name).suffix.lower()
226
+ try:
227
+ data = zf.read(name)
228
+ if ext in (".txt", ".md", ".csv", ".json", ".xml", ".html", ".htm"):
229
+ parts.append(f"[{name}]\n{data.decode('utf-8', errors='replace')}")
230
+ elif ext == ".xlsx":
231
+ parts.append(f"[{name}]\n{xlsx_to_text(data)}")
232
+ elif ext == ".pptx":
233
+ parts.append(f"[{name}]\n{pptx_to_text(data)}")
234
+ elif ext == ".docx":
235
+ text, _ = docx_to_content(data)
236
+ parts.append(f"[{name}]\n{text}")
237
+ except Exception as e:
238
+ print(f"[FILE][WARN] ZIP entry {name} failed: {e}")
239
+ return "\n\n".join(parts)
240
+ except Exception:
241
+ print("[FILE][ERROR] ZIP processing failed:")
242
+ traceback.print_exc()
243
+ return ""
244
+
245
+ def url_bytes(url: str):
246
+ """URL โ†’ (bytes, ext)"""
247
+ try:
248
+ r = requests.get(url, timeout=30, headers={"User-Agent": "GLM-OCR/1.0"})
249
+ r.raise_for_status()
250
+ ct = r.headers.get("Content-Type", "")
251
+ ext = mimetypes.guess_extension(ct.split(";")[0].strip()) or \
252
+ Path(url.split("?")[0]).suffix.lower()
253
+ return r.content, ext.lower()
254
+ except Exception:
255
+ print(f"[FILE][ERROR] URL download failed: {url}")
256
+ traceback.print_exc()
257
+ return None, ""
258
+
259
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ GLM-OCR ๆŽจ็† โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€๏ฟฝ๏ฟฝโ”€โ”€โ”€โ”€
260
+
261
+ def glm_ocr_infer(images: List[Image.Image], prompt: str = "Text Recognition:") -> str:
262
+ """ๅฏนๅ›พ็‰‡ๅˆ—่กจๆ‰ง่กŒ GLM-OCR ๆŽจ็†๏ผŒ่ฟ”ๅ›žๅˆๅนถๆ–‡ๆœฌ"""
263
+ import torch
264
+ if not images:
265
+ return ""
266
+ results = []
267
+ for idx, img in enumerate(images):
268
+ print(f"[OCR] Inferring image {idx+1}/{len(images)} ...")
269
+ try:
270
+ messages = [{
271
+ "role": "user",
272
+ "content": [
273
+ {"type": "image", "image": img},
274
+ {"type": "text", "text": prompt},
275
+ ],
276
+ }]
277
+ inputs = _processor.apply_chat_template(
278
+ messages,
279
+ tokenize=True,
280
+ add_generation_prompt=True,
281
+ return_dict=True,
282
+ return_tensors="pt",
283
+ ).to(_model.device)
284
+ inputs.pop("token_type_ids", None)
285
+
286
+ with torch.no_grad():
287
+ gen_ids = _model.generate(**inputs, max_new_tokens=8192, do_sample=False)
288
+
289
+ output = _processor.decode(
290
+ gen_ids[0][inputs["input_ids"].shape[1]:],
291
+ skip_special_tokens=True,
292
+ ).strip()
293
+ print(f"[OCR] Image {idx+1} done, {len(output)} chars")
294
+ results.append(output)
295
+ except Exception:
296
+ print(f"[OCR][ERROR] Inference failed on image {idx+1}:")
297
+ traceback.print_exc()
298
+ results.append("")
299
+ return "\n\n---\n\n".join(results)
300
+
301
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๆถˆๆฏ่งฃๆž โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
302
+
303
+ def parse_messages(messages: List[Message]):
304
+ """ไปŽ OpenAI ๆถˆๆฏๅˆ—่กจๆๅ–: imagesๅˆ—่กจ + text_prompt"""
305
+ images = []
306
+ text_parts = []
307
+ ocr_instruction = "Text Recognition:" # ้ป˜่ฎค OCR ๆŒ‡ไปค
308
+
309
+ for msg in messages:
310
+ if msg.role not in ("user", "system"):
311
+ continue
312
+ content = msg.content
313
+ if isinstance(content, str):
314
+ text_parts.append(content)
315
+ continue
316
+ for part in content:
317
+ if part.type == "text" and part.text:
318
+ text_parts.append(part.text)
319
+ elif part.type == "image_url" and part.image_url:
320
+ url_val = part.image_url.url
321
+ try:
322
+ if url_val.startswith("data:"):
323
+ # base64 ๅ†…่”ๅ›พ็‰‡
324
+ images.append(b64_to_image(url_val))
325
+ elif any(url_val.lower().endswith(ext) for ext in
326
+ (".png", ".jpg", ".jpeg", ".gif", ".bmp", ".tiff", ".webp")):
327
+ images.append(url_to_image(url_val))
328
+ else:
329
+ # ้€š็”จ URL๏ผšไธ‹่ฝฝๅŽๅˆคๆ–ญ็ฑปๅž‹
330
+ data, ext = url_bytes(url_val)
331
+ if data:
332
+ if ext in (".pdf",):
333
+ imgs = pdf_to_images(data)
334
+ images.extend(imgs)
335
+ elif ext in (".docx", ".doc"):
336
+ txt, imgs = docx_to_content(data)
337
+ if txt:
338
+ text_parts.append(txt)
339
+ images.extend(imgs)
340
+ elif ext in (".xlsx", ".xls"):
341
+ text_parts.append(xlsx_to_text(data))
342
+ elif ext in (".pptx", ".ppt"):
343
+ text_parts.append(pptx_to_text(data))
344
+ elif ext in (".zip",):
345
+ text_parts.append(zip_to_text(data))
346
+ elif ext in (".txt", ".md", ".csv", ".json", ".xml", ".html", ".htm"):
347
+ text_parts.append(data.decode("utf-8", errors="replace"))
348
+ else:
349
+ # ๅฐ่ฏ•ๅฝ“ๅ›พ็‰‡ๅค„็†
350
+ try:
351
+ images.append(Image.open(io.BytesIO(data)).convert("RGB"))
352
+ except Exception:
353
+ print(f"[WARN] Unknown file type: {ext}, skipping")
354
+ except Exception:
355
+ print(f"[ERROR] Failed to process content part:")
356
+ traceback.print_exc()
357
+
358
+ combined_text = "\n".join(text_parts).strip()
359
+ if combined_text:
360
+ ocr_instruction = combined_text
361
+ return images, ocr_instruction
362
+
363
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ API ็ซฏ็‚น โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
364
+
365
+ @app.get("/")
366
+ def root():
367
+ return {
368
+ "service": "GLM-OCR OpenAI Compatible API",
369
+ "model": MODEL_ALIAS,
370
+ "status": "running",
371
+ "endpoints": {
372
+ "models": "GET /v1/models",
373
+ "chat": "POST /v1/chat/completions",
374
+ },
375
+ "chatbox_config": {
376
+ "api_url": "https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space",
377
+ "model": MODEL_ALIAS,
378
+ "note": "Set API_KEY in HF Space Secrets"
379
+ }
380
+ }
381
+
382
+ @app.get("/v1/models", dependencies=[Depends(verify_api_key)])
383
+ def list_models():
384
+ return {
385
+ "object": "list",
386
+ "data": [{
387
+ "id": MODEL_ALIAS,
388
+ "object": "model",
389
+ "created": int(time.time()),
390
+ "owned_by": "zai-org",
391
+ "permission": [],
392
+ "root": MODEL_ALIAS,
393
+ }]
394
+ }
395
+
396
+ @app.post("/v1/chat/completions", dependencies=[Depends(verify_api_key)])
397
+ async def chat_completions(req: ChatRequest):
398
+ start_time = time.time()
399
+ request_id = f"chatcmpl-{int(start_time * 1000)}"
400
+ print(f"\n[REQUEST] {request_id} | model={req.model} | stream={req.stream}")
401
+
402
+ try:
403
+ images, prompt = parse_messages(req.messages)
404
+ print(f"[REQUEST] images={len(images)} | prompt_len={len(prompt)}")
405
+
406
+ if images:
407
+ # ๆœ‰ๅ›พ็‰‡๏ผŒ่ฟ่กŒ OCR
408
+ result_text = glm_ocr_infer(images, prompt)
409
+ if not result_text.strip():
410
+ result_text = "(OCR returned empty result)"
411
+ elif prompt.strip():
412
+ # ็บฏๆ–‡ๆœฌ๏ผš็›ดๆŽฅ็”จ glm-ocr ๅš้—ฎ็ญ”
413
+ images_empty = []
414
+ result_text = glm_ocr_infer(images_empty, prompt)
415
+ if not result_text:
416
+ result_text = "Please provide an image or document for OCR processing."
417
+ else:
418
+ result_text = "Please send an image or document to process."
419
+
420
+ elapsed = time.time() - start_time
421
+ print(f"[REQUEST] {request_id} done in {elapsed:.1f}s | result_len={len(result_text)}")
422
+
423
+ response_obj = {
424
+ "id": request_id,
425
+ "object": "chat.completion",
426
+ "created": int(start_time),
427
+ "model": MODEL_ALIAS,
428
+ "choices": [{
429
+ "index": 0,
430
+ "message": {
431
+ "role": "assistant",
432
+ "content": result_text,
433
+ },
434
+ "finish_reason": "stop",
435
+ }],
436
+ "usage": {
437
+ "prompt_tokens": len(prompt.split()),
438
+ "completion_tokens": len(result_text.split()),
439
+ "total_tokens": len(prompt.split()) + len(result_text.split()),
440
+ }
441
+ }
442
+
443
+ if req.stream:
444
+ # SSE streaming (ๅ•ๅ—ๅ‘ๅ‡บ)
445
+ def event_stream():
446
+ chunk = {
447
+ "id": request_id,
448
+ "object": "chat.completion.chunk",
449
+ "created": int(start_time),
450
+ "model": MODEL_ALIAS,
451
+ "choices": [{
452
+ "index": 0,
453
+ "delta": {"role": "assistant", "content": result_text},
454
+ "finish_reason": None,
455
+ }]
456
+ }
457
+ yield f"data: {json.dumps(chunk, ensure_ascii=False)}\n\n"
458
+ # ๅ‘้€็ป“ๆŸๆ ‡ๅฟ—
459
+ end_chunk = {
460
+ "id": request_id,
461
+ "object": "chat.completion.chunk",
462
+ "created": int(start_time),
463
+ "model": MODEL_ALIAS,
464
+ "choices": [{
465
+ "index": 0,
466
+ "delta": {},
467
+ "finish_reason": "stop",
468
+ }]
469
+ }
470
+ yield f"data: {json.dumps(end_chunk)}\n\n"
471
+ yield "data: [DONE]\n\n"
472
+ return StreamingResponse(event_stream(), media_type="text/event-stream")
473
+
474
+ return JSONResponse(content=response_obj)
475
+
476
+ except HTTPException:
477
+ raise
478
+ except Exception:
479
+ print(f"[REQUEST][ERROR] {request_id} unhandled exception:")
480
+ traceback.print_exc()
481
+ raise HTTPException(status_code=500, detail=traceback.format_exc())
482
+
483
+ # โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๅฏๅŠจ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
484
+ if __name__ == "__main__":
485
+ uvicorn.run(app, host="0.0.0.0", port=PORT, log_level="info")
requirements.txt ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GLM-OCR HuggingFace Space Dependencies
2
+ # Core ML
3
+ transformers>=4.51.0
4
+ torch>=2.1.0
5
+ accelerate>=0.27.0
6
+
7
+ # API Server
8
+ fastapi>=0.104.0
9
+ uvicorn[standard]>=0.24.0
10
+ pydantic>=2.0.0
11
+ python-multipart>=0.0.6
12
+
13
+ # Image processing
14
+ Pillow>=10.0.0
15
+
16
+ # PDF support
17
+ pdf2image>=1.16.0
18
+
19
+ # Office document support
20
+ python-docx>=1.1.0
21
+ openpyxl>=3.1.2
22
+ python-pptx>=0.6.23
23
+
24
+ # HTTP client
25
+ requests>=2.31.0