chriswu25 commited on
Commit
be32bc2
·
verified ·
1 Parent(s): e79128b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -310
README.md CHANGED
@@ -1,315 +1,18 @@
1
- <div align="center">
 
 
 
 
 
 
 
2
 
3
- English | [简体中文](docs/README_zh-CN.md) | [繁體中文](docs/README_zh-TW.md) | [日本語](docs/README_ja-JP.md) | [한국어](docs/README_ko-KR.md)
4
 
5
- <img src="./docs/images/banner.png" width="320px" alt="PDF2ZH"/>
6
 
7
- <h2 id="title">PDFMathTranslate</h2>
8
 
9
- <p>
10
- <!-- PyPI -->
11
- <a href="https://pypi.org/project/pdf2zh/">
12
- <img src="https://img.shields.io/pypi/v/pdf2zh"></a>
13
- <a href="https://pepy.tech/projects/pdf2zh">
14
- <img src="https://static.pepy.tech/badge/pdf2zh"></a>
15
- <a href="https://hub.docker.com/repository/docker/byaidu/pdf2zh">
16
- <img src="https://img.shields.io/docker/pulls/byaidu/pdf2zh"></a>
17
- <a href="https://gitcode.com/Byaidu/PDFMathTranslate/overview">
18
- <img src="https://gitcode.com/Byaidu/PDFMathTranslate/star/badge.svg"></a>
19
- <a href="https://huggingface.co/spaces/reycn/PDFMathTranslate-Docker">
20
- <img src="https://img.shields.io/badge/%F0%9F%A4%97-Online%20Demo-FF9E0D"></a>
21
- <a href="https://www.modelscope.cn/studios/AI-ModelScope/PDFMathTranslate">
22
- <img src="https://img.shields.io/badge/ModelScope-Demo-blue"></a>
23
- <a href="https://github.com/Byaidu/PDFMathTranslate/pulls">
24
- <img src="https://img.shields.io/badge/contributions-welcome-green"></a>
25
- <a href="https://t.me/+Z9_SgnxmsmA5NzBl">
26
- <img src="https://img.shields.io/badge/Telegram-2CA5E0?style=flat-squeare&logo=telegram&logoColor=white"></a>
27
- <!-- License -->
28
- <a href="./LICENSE">
29
- <img src="https://img.shields.io/github/license/Byaidu/PDFMathTranslate"></a>
30
- </p>
31
 
32
- <a href="https://trendshift.io/repositories/12424" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12424" alt="Byaidu%2FPDFMathTranslate | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
33
-
34
- </div>
35
-
36
- PDF scientific paper translation and bilingual comparison.
37
-
38
- - 📊 Preserve formulas, charts, table of contents, and annotations _([preview](#preview))_.
39
- - 🌐 Support [multiple languages](#language), and diverse [translation services](#services).
40
- - 🤖 Provides [commandline tool](#usage), [interactive user interface](#gui), and [Docker](#docker)
41
-
42
- Feel free to provide feedback in [GitHub Issues](https://github.com/Byaidu/PDFMathTranslate/issues) or [Telegram Group](https://t.me/+Z9_SgnxmsmA5NzBl).
43
-
44
- For details on how to contribute, please consult the [Contribution Guide](https://github.com/Byaidu/PDFMathTranslate/wiki/Contribution-Guide---%E8%B4%A1%E7%8C%AE%E6%8C%87%E5%8D%97).
45
-
46
- <h2 id="updates">Updates</h2>
47
-
48
- - [Mar. 3, 2025] Experimental support for the new backend [BabelDOC](https://github.com/funstory-ai/BabelDOC) WebUI added as an experimental option (by [@awwaawwa](https://github.com/awwaawwa))
49
- - [Feb. 22 2025] Better release CI and well-packaged windows-amd64 exe (by [@awwaawwa](https://github.com/awwaawwa))
50
- - [Dec. 24 2024] The translator now supports local models on [Xinference](https://github.com/xorbitsai/inference) _(by [@imClumsyPanda](https://github.com/imClumsyPanda))_
51
- - [Dec. 19 2024] Non-PDF/A documents are now supported using `-cp` _(by [@reycn](https://github.com/reycn))_
52
- - [Dec. 13 2024] Additional support for backend by _(by [@YadominJinta](https://github.com/YadominJinta))_
53
- - [Dec. 10 2024] The translator now supports OpenAI models on Azure _(by [@yidasanqian](https://github.com/yidasanqian))_
54
-
55
- <h2 id="preview">Preview</h2>
56
-
57
- <div align="center">
58
- <img src="./docs/images/preview.gif" width="80%"/>
59
- </div>
60
-
61
- <h2 id="demo">Online Service 🌟</h2>
62
-
63
- You can try our application out using either of the following demos:
64
-
65
- - [Public free service](https://pdf2zh.com/) online without installation _(recommended)_.
66
- - [Immersive Translate - BabelDOC](https://app.immersivetranslate.com/babel-doc/) 1000 free pages per month. _(recommended)_
67
- - [Demo hosted on HuggingFace](https://huggingface.co/spaces/reycn/PDFMathTranslate-Docker)
68
- - [Demo hosted on ModelScope](https://www.modelscope.cn/studios/AI-ModelScope/PDFMathTranslate) without installation.
69
-
70
- Note that the computing resources of the demo are limited, so please avoid abusing them.
71
-
72
- <h2 id="install">Installation and Usage</h2>
73
-
74
- ### Methods
75
-
76
- For different use cases, we provide distinct methods to use our program:
77
-
78
- <details open>
79
- <summary>1. UV install</summary>
80
-
81
- 1. Python installed (3.10 <= version <= 3.12)
82
- 2. Install our package:
83
-
84
- ```bash
85
- pip install uv
86
- uv tool install --python 3.12 pdf2zh
87
- ```
88
-
89
- 3. Execute translation, files generated in [current working directory](https://chatgpt.com/share/6745ed36-9acc-800e-8a90-59204bd13444):
90
-
91
- ```bash
92
- pdf2zh document.pdf
93
- ```
94
-
95
- </details>
96
-
97
- <details>
98
- <summary>2. Windows exe</summary>
99
-
100
- 1. Download pdf2zh-version-win64.zip from [release page](https://github.com/Byaidu/PDFMathTranslate/releases)
101
-
102
- 2. Unzip and double-click `pdf2zh.exe` to run.
103
-
104
- </details>
105
-
106
- <details>
107
- <summary>3. Graphic user interface</summary>
108
- 1. Python installed (3.10 <= version <= 3.12)
109
- 2. Install our package:
110
-
111
- ```bash
112
- pip install pdf2zh
113
- ```
114
-
115
- 3. Start using in browser:
116
-
117
- ```bash
118
- pdf2zh -i
119
- ```
120
-
121
- 4. If your browswer has not been started automatically, goto
122
-
123
- ```bash
124
- http://localhost:7860/
125
- ```
126
-
127
- <img src="./docs/images/gui.gif" width="500"/>
128
-
129
- See [documentation for GUI](./docs/README_GUI.md) for more details.
130
-
131
- </details>
132
-
133
- <details>
134
- <summary>4. Docker</summary>
135
-
136
- 1. Pull and run:
137
-
138
- ```bash
139
- docker pull byaidu/pdf2zh
140
- docker run -d -p 7860:7860 byaidu/pdf2zh
141
- ```
142
-
143
- 2. Open in browser:
144
-
145
- ```
146
- http://localhost:7860/
147
- ```
148
-
149
- For docker deployment on cloud service:
150
-
151
- <div>
152
- <a href="https://www.heroku.com/deploy?template=https://github.com/Byaidu/PDFMathTranslate">
153
- <img src="https://www.herokucdn.com/deploy/button.svg" alt="Deploy" height="26"></a>
154
- <a href="https://render.com/deploy">
155
- <img src="https://render.com/images/deploy-to-render-button.svg" alt="Deploy to Koyeb" height="26"></a>
156
- <a href="https://zeabur.com/templates/5FQIGX?referralCode=reycn">
157
- <img src="https://zeabur.com/button.svg" alt="Deploy on Zeabur" height="26"></a>
158
- <a href="https://app.koyeb.com/deploy?type=git&builder=buildpack&repository=github.com/Byaidu/PDFMathTranslate&branch=main&name=pdf-math-translate">
159
- <img src="https://www.koyeb.com/static/images/deploy/button.svg" alt="Deploy to Koyeb" height="26"></a>
160
- </div>
161
-
162
- </details>
163
-
164
- <details>
165
- <summary>5. Zotero Plugin</summary>
166
-
167
-
168
- See [Zotero PDF2zh](https://github.com/guaguastandup/zotero-pdf2zh) for more details.
169
-
170
- </details>
171
-
172
- <details>
173
- <summary>6. Commandline</summary>
174
-
175
- 1. Python installed (3.10 <= version <= 3.12)
176
- 2. Install our package:
177
-
178
- ```bash
179
- pip install pdf2zh
180
- ```
181
-
182
- 3. Execute translation, files generated in [current working directory](https://chatgpt.com/share/6745ed36-9acc-800e-8a90-59204bd13444):
183
-
184
- ```bash
185
- pdf2zh document.pdf
186
- ```
187
-
188
- </details>
189
-
190
- > [!TIP]
191
- >
192
- > - If you're using Windows and cannot open the file after downloading, please install [vc_redist.x64.exe](https://aka.ms/vs/17/release/vc_redist.x64.exe) and try again.
193
- >
194
- > - If you cannot access Docker Hub, please try the image on [GitHub Container Registry](https://github.com/Byaidu/PDFMathTranslate/pkgs/container/pdfmathtranslate).
195
- > ```bash
196
- > docker pull ghcr.io/byaidu/pdfmathtranslate
197
- > docker run -d -p 7860:7860 ghcr.io/byaidu/pdfmathtranslate
198
- > ```
199
-
200
- ### Unable to install?
201
-
202
- The present program needs an AI model(`wybxc/DocLayout-YOLO-DocStructBench-onnx`) before working and some users are not able to download due to network issues. If you have a problem with downloading this model, we provide a workaround using the following environment variable:
203
-
204
- ```shell
205
- set HF_ENDPOINT=https://hf-mirror.com
206
- ```
207
-
208
- For PowerShell user:
209
-
210
- ```shell
211
- $env:HF_ENDPOINT = https://hf-mirror.com
212
- ```
213
-
214
- If the solution does not work to you / you encountered other issues, please refer to [frequently asked questions](https://github.com/Byaidu/PDFMathTranslate/wiki#-faq--%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98).
215
-
216
- <h2 id="usage">Advanced Options</h2>
217
-
218
- Execute the translation command in the command line to generate the translated document `example-mono.pdf` and the bilingual document `example-dual.pdf` in the current working directory. Use Google as the default translation service. More support translation services can find [HERE](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#services).
219
-
220
- <img src="./docs/images/cmd.explained.png" width="580px" alt="cmd"/>
221
-
222
- In the following table, we list all advanced options for reference:
223
-
224
- | Option | Function | Example |
225
- | -------------- | ------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- |
226
- | files | Local files | `pdf2zh ~/local.pdf` |
227
- | links | Online files | `pdf2zh http://arxiv.org/paper.pdf` |
228
- | `-i` | [Enter GUI](#gui) | `pdf2zh -i` |
229
- | `-p` | [Partial document translation](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#partial) | `pdf2zh example.pdf -p 1` |
230
- | `-li` | [Source language](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#languages) | `pdf2zh example.pdf -li en` |
231
- | `-lo` | [Target language](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#languages) | `pdf2zh example.pdf -lo zh` |
232
- | `-s` | [Translation service](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#services) | `pdf2zh example.pdf -s deepl` |
233
- | `-t` | [Multi-threads](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#threads) | `pdf2zh example.pdf -t 1` |
234
- | `-o` | Output dir | `pdf2zh example.pdf -o output` |
235
- | `-f`, `-c` | [Exceptions](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#exceptions) | `pdf2zh example.pdf -f "(MS.*)"` |
236
- | `-cp` | Compatibility Mode | `pdf2zh example.pdf --compatible` |
237
- | `--skip-subset-fonts` | [Skip font subset](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#font-subset) | `pdf2zh example.pdf --skip-subset-fonts` |
238
- | `--ignore-cache` | [Ignore translate cache](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#cache) | `pdf2zh example.pdf --ignore-cache` |
239
- | `--share` | Public link | `pdf2zh -i --share` |
240
- | `--authorized` | [Authorization](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#auth) | `pdf2zh -i --authorized users.txt [auth.html]` |
241
- | `--prompt` | [Custom Prompt](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#prompt) | `pdf2zh --prompt [prompt.txt]` |
242
- | `--onnx` | [Use Custom DocLayout-YOLO ONNX model] | `pdf2zh --onnx [onnx/model/path]` |
243
- | `--serverport` | [Use Custom WebUI port] | `pdf2zh --serverport 7860` |
244
- | `--dir` | [batch translate] | `pdf2zh --dir /path/to/translate/` |
245
- | `--config` | [configuration file](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#cofig) | `pdf2zh --config /path/to/config/config.json` |
246
- | `--serverport` | [custom gradio server port] | `pdf2zh --serverport 7860` |
247
- |`--babeldoc`| Use Experimental backend [BabelDOC](https://funstory-ai.github.io/BabelDOC/) to translate |`pdf2zh --babeldoc` -s openai example.pdf|
248
-
249
- For detailed explanations, please refer to our document about [Advanced Usage](./docs/ADVANCED.md) for a full list of each option.
250
-
251
- <h2 id="downstream">Secondary Development (APIs)</h2>
252
-
253
- The current pdf2zh API is temporarily deprecated. The API will be provided again after [pdf2zh 2.0](https://github.com/Byaidu/PDFMathTranslate/issues/586) is released. For users who need programmatic access, please use the `babeldoc.high_level.async_translate` function of [BabelDOC](https://github.com/funstory-ai/BabelDOC).
254
-
255
- This API being temporarily deprecated means: the relevant code will not be removed for now, but no technical support will be provided, and no bug fixes will be made.
256
- <!-- For downstream applications, please refer to our document about [API Details](./docs/APIS.md) for futher information about:
257
-
258
- - [Python API](./docs/APIS.md#api-python), how to use the program in other Python programs
259
- - [HTTP API](./docs/APIS.md#api-http), how to communicate with a server with the program installed -->
260
-
261
- <h2 id="todo">TODOs</h2>
262
-
263
- - [ ] Parse layout with DocLayNet based models, [PaddleX](https://github.com/PaddlePaddle/PaddleX/blob/17cc27ac3842e7880ca4aad92358d3ef8555429a/paddlex/repo_apis/PaddleDetection_api/object_det/official_categories.py#L81), [PaperMage](https://github.com/allenai/papermage/blob/9cd4bb48cbedab45d0f7a455711438f1632abebe/README.md?plain=1#L102), [SAM2](https://github.com/facebookresearch/sam2)
264
-
265
- - [ ] Fix page rotation, table of contents, format of lists
266
-
267
- - [ ] Fix pixel formula in old papers
268
-
269
- - [ ] Async retry except KeyboardInterrupt
270
-
271
- - [ ] Knuth–Plass algorithm for western languages
272
-
273
- - [ ] Support non-PDF/A files
274
-
275
- - [ ] Plugins of [Zotero](https://github.com/zotero/zotero) and [Obsidian](https://github.com/obsidianmd/obsidian-releases)
276
-
277
- <h2 id="acknowledgement">Acknowledgements</h2>
278
-
279
- - [Immersive Translation](https://immersivetranslate.com) sponsors monthly Pro membership redemption codes for active contributors to this project, see details at: [CONTRIBUTOR_REWARD.md](https://github.com/funstory-ai/BabelDOC/blob/main/docs/CONTRIBUTOR_REWARD.md)
280
-
281
- - New backend: [BabelDOC](https://github.com/funstory-ai/BabelDOC)
282
-
283
- - Document merging: [PyMuPDF](https://github.com/pymupdf/PyMuPDF)
284
-
285
- - Document parsing: [Pdfminer.six](https://github.com/pdfminer/pdfminer.six)
286
-
287
- - Document extraction: [MinerU](https://github.com/opendatalab/MinerU)
288
-
289
- - Document Preview: [Gradio PDF](https://github.com/freddyaboulton/gradio-pdf)
290
-
291
- - Multi-threaded translation: [MathTranslate](https://github.com/SUSYUSTC/MathTranslate)
292
-
293
- - Layout parsing: [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
294
-
295
- - Document standard: [PDF Explained](https://zxyle.github.io/PDF-Explained/), [PDF Cheat Sheets](https://pdfa.org/resource/pdf-cheat-sheets/)
296
-
297
- - Multilingual Font: [Go Noto Universal](https://github.com/satbyy/go-noto-universal)
298
-
299
- <h2 id="contrib">Contributors</h2>
300
-
301
- <a href="https://github.com/Byaidu/PDFMathTranslate/graphs/contributors">
302
- <img src="https://opencollective.com/PDFMathTranslate/contributors.svg?width=890&button=false" />
303
- </a>
304
-
305
- ![Alt](https://repobeats.axiom.co/api/embed/dfa7583da5332a11468d686fbd29b92320a6a869.svg "Repobeats analytics image")
306
-
307
- <h2 id="star_hist">Star History</h2>
308
-
309
- <a href="https://star-history.com/#Byaidu/PDFMathTranslate&Date">
310
- <picture>
311
- <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date&theme=dark" />
312
- <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date" />
313
- <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Byaidu/PDFMathTranslate&type=Date"/>
314
- </picture>
315
- </a>
 
1
+ ---
2
+ title: PDFMathTranslate Docker
3
+ emoji: 📄
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ pinned: false
8
+ ---
9
 
10
+ # PDFMathTranslate Docker Space
11
 
12
+ This Space runs the [PDFMathTranslate](https://github.com/Byaidu/PDFMathTranslate) tool using a Docker container.
13
 
14
+ It allows translating PDF files, especially those containing mathematical formulas, into Chinese.
15
 
16
+ The application is started via the `CMD ["pdf2zh", "-i"]` command within the Dockerfile.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference