fromozu commited on
Commit
1b7d40b
·
verified ·
1 Parent(s): 5f457b7

Upload bilingual_book_maker/README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. bilingual_book_maker/README.md +202 -0
bilingual_book_maker/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **[中文](./README-CN.md) | English**
2
+ [![litellm](https://img.shields.io/badge/%20%F0%9F%9A%85%20liteLLM-OpenAI%7CAzure%7CAnthropic%7CPalm%7CCohere%7CReplicate%7CHugging%20Face-blue?color=green)](https://github.com/BerriAI/litellm)
3
+
4
+ # bilingual_book_maker
5
+ The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt/srt files and books. This tool is exclusively designed for translating epub books that have entered the public domain and is not intended for copyrighted works. Before using this tool, please review the project's **[disclaimer](./disclaimer.md)**.
6
+
7
+ ![image](https://user-images.githubusercontent.com/15976103/222317531-a05317c5-4eee-49de-95cd-04063d9539d9.png)
8
+
9
+ ## Supported Models
10
+ gpt-4, gpt-3.5-turbo, claude-2, palm, llama-2, azure-openai, command-nightly, gemini
11
+ For using Non-OpenAI models, use class `liteLLM()` - liteLLM supports all models above.
12
+ Find more info here for using liteLLM: https://github.com/BerriAI/litellm/blob/main/setup.py
13
+
14
+ ## Preparation
15
+
16
+ 1. ChatGPT or OpenAI token [^token]
17
+ 2. epub/txt books
18
+ 3. Environment with internet access or proxy
19
+ 4. Python 3.8+
20
+
21
+ ## Use
22
+
23
+ - `pip install -r requirements.txt` or `pip install -U bbook_maker`(you can use)
24
+ - Use `--openai_key` option to specify OpenAI API key. If you have multiple keys, separate them by commas (xxx,xxx,xxx) to reduce errors caused by API call limits.
25
+ Or, just set environment variable `BBM_OPENAI_API_KEY` instead.
26
+ - A sample book, `test_books/animal_farm.epub`, is provided for testing purposes.
27
+ - The default underlying model is [GPT-3.5-turbo](https://openai.com/blog/introducing-chatgpt-and-whisper-apis), which is used by ChatGPT currently. Use `--model gpt4` to change the underlying model to `GPT4`.
28
+ If using `GPT4`, you can add `--use_context` to add a context paragraph to each passage sent to the model for translation (see below)
29
+ - support DeepL model [DeepL Translator](https://rapidapi.com/splintPRO/api/dpl-translator) need pay to get the token use `--model deepl --deepl_key ${deepl_key}`
30
+ - support DeepL free model `--model deeplfree`
31
+ - support Google [Gemini](https://makersuite.google.com/app/apikey) model `--model gemini --gemini_key ${gemini_key}`
32
+ - Support [Claude](https://console.anthropic.com/docs) model, use `--model claude --claude_key ${claude_key}`
33
+ - Use `--test` option to preview the result if you haven't paid for the service. Note that there is a limit and it may take some time.
34
+ - Set the target language like `--language "Simplified Chinese"`. Default target language is `"Simplified Chinese"`.
35
+ Read available languages by helper message: `python make_book.py --help`
36
+ - Use `--proxy` option to specify proxy server for internet access. Enter a string such as `http://127.0.0.1:7890`.
37
+ - Use `--resume` option to manually resume the process after an interruption.
38
+ - epub is made of html files. By default, we only translate contents in `<p>`.
39
+ Use `--translate-tags` to specify tags need for translation. Use comma to separate multiple tags. For example:
40
+ `--translate-tags h1,h2,h3,p,div`
41
+ - Use `--book_from` option to specify e-reader type (Now only `kobo` is available), and use `--device_path` to specify the mounting point.
42
+ - If you want to change api_base like using Cloudflare Workers, use `--api_base <URL>` to support it.
43
+ **Note: the api url should be '`https://xxxx/v1`'. Quotation marks are required.**
44
+ - Once the translation is complete, a bilingual book named `${book_name}_bilingual.epub` would be generated.
45
+ - If there are any errors or you wish to interrupt the translation by pressing `CTRL+C`. A book named `${book_name}_bilingual_temp.epub` would be generated. You can simply rename it to any desired name.
46
+ - If you want to translate strings in an e-book that aren't labeled with any tags, you can use the `--allow_navigable_strings` parameter. This will add the strings to the translation queue. **Note that it's best to look for e-books that are more standardized if possible.**
47
+ - To tweak the prompt, use the `--prompt` parameter. Valid placeholders for the `user` role template include `{text}` and `{language}`. It supports a few ways to configure the prompt:
48
+ If you don't need to set the `system` role content, you can simply set it up like this: `--prompt "Translate {text} to {language}."` or `--prompt prompt_template_sample.txt` (example of a text file can be found at [./prompt_template_sample.txt](./prompt_template_sample.txt)).
49
+ If you need to set the `system` role content, you can use the following format: `--prompt '{"user":"Translate {text} to {language}", "system": "You are a professional translator."}'` or `--prompt prompt_template_sample.json` (example of a JSON file can be found at [./prompt_template_sample.json](./prompt_template_sample.json)).
50
+ You can also set the `user` and `system` role prompt by setting environment variables: `BBM_CHATGPTAPI_USER_MSG_TEMPLATE` and `BBM_CHATGPTAPI_SYS_MSG`.
51
+ - Use the `--batch_size` parameter to specify the number of lines for batch translation (default is 10, currently only effective for txt files).
52
+ - `--accumulated_num` Wait for how many tokens have been accumulated before starting the translation. gpt3.5 limits the total_token to 4090. For example, if you use --accumulated_num 1600, maybe openai will
53
+ output 2200 tokens and maybe 200 tokens for other messages in the system messages user messages, 1600+2200+200=4000, So you are close to reaching the limit. You have to choose your own
54
+ value, there is no way to know if the limit is reached before sending
55
+ - `--use_context` prompts the GPT4 model to create a one-paragraph summary. If it's the beginning of the translation, it will summarize the entire passage sent (the size depending on `--accumulated_num`), but if it's any proceeding passage, it will amend the summary to include details from the most recent passage, creating a running one-paragraph context payload of the important details of the entire translated work, which improves consistency of flow and tone of each translation.
56
+ - `--translation_style` example: `--translation_style "color: #808080; font-style: italic;"`
57
+ - `--retranslate` `--retranslate "$translated_filepath" "file_name_in_epub" "start_str" "end_str"(optional)`<br>
58
+ Retranslate from start_str to end_str's tag:
59
+ `python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which' 'This kind of thing is not a good symptom. Obviously'`<br>
60
+ Retranslate start_str's tag:
61
+ `python3 "make_book.py" --book_name "test_books/animal_farm.epub" --retranslate 'test_books/animal_farm_bilingual.epub' 'index_split_002.html' 'in spite of the present book shortage which'`
62
+ ### Examples
63
+
64
+ **Note if use `pip install bbook_maker` all commands can change to `bbook_maker args`**
65
+
66
+ ```shell
67
+ # Test quickly
68
+ python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --test --language zh-hans
69
+
70
+ # Test quickly for src
71
+ python3 make_book.py --book_name test_books/Lex_Fridman_episode_322.srt --openai_key ${openai_key} --test
72
+
73
+ # Or translate the whole book
74
+ python3 make_book.py --book_name test_books/animal_farm.epub --openai_key ${openai_key} --language zh-hans
75
+
76
+ # Or translate the whole book using Gemini
77
+ python3 make_book.py --book_name test_books/animal_farm.epub --gemini_key ${gemini_key} --model gemini
78
+
79
+ # Set env OPENAI_API_KEY to ignore option --openai_key
80
+ export OPENAI_API_KEY=${your_api_key}
81
+
82
+ # Use the GPT-4 model with context to Japanese
83
+ python3 make_book.py --book_name test_books/animal_farm.epub --model gpt4 --use_context --language ja
84
+
85
+ # Use the DeepL model with Japanese
86
+ python3 make_book.py --book_name test_books/animal_farm.epub --model deepl --deepl_key ${deepl_key} --language ja
87
+
88
+
89
+ # Use the Claude model with Japanese
90
+ python3 make_book.py --book_name test_books/animal_farm.epub --model claude --claude_key ${claude_key} --language ja
91
+
92
+ # Use the CustomAPI model with Japanese
93
+ python3 make_book.py --book_name test_books/animal_farm.epub --model customapi --custom_api ${custom_api} --language ja
94
+
95
+ # Translate contents in <div> and <p>
96
+ python3 make_book.py --book_name test_books/animal_farm.epub --translate-tags div,p
97
+
98
+ # Tweaking the prompt
99
+ python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.txt
100
+ # or
101
+ python3 make_book.py --book_name test_books/animal_farm.epub --prompt prompt_template_sample.json
102
+ # or
103
+ python3 make_book.py --book_name test_books/animal_farm.epub --prompt "Please translate \`{text}\` to {language}"
104
+
105
+ # Translate books download from Rakuten Kobo on kobo e-reader
106
+ python3 make_book.py --book_from kobo --device_path /tmp/kobo
107
+
108
+ # translate txt file
109
+ python3 make_book.py --book_name test_books/the_little_prince.txt --test --language zh-hans
110
+ # aggregated translation txt file
111
+ python3 make_book.py --book_name test_books/the_little_prince.txt --test --batch_size 20
112
+
113
+ # Using Caiyun model to translate
114
+ # (the api currently only support: simplified chinese <-> english, simplified chinese <-> japanese)
115
+ # the official Caiyun has provided a test token (3975l6lr5pcbvidl6jl2)
116
+ # you can apply your own token by following this tutorial(https://bobtranslate.com/service/translate/caiyun.html)
117
+ python3 make_book.py --model caiyun --caiyun_key 3975l6lr5pcbvidl6jl2 --book_name test_books/animal_farm.epub
118
+
119
+
120
+ # Set env BBM_CAIYUN_API_KEY to ignore option --openai_key
121
+ export BBM_CAIYUN_API_KEY=${your_api_key}
122
+
123
+ ```
124
+
125
+ More understandable example
126
+ ```shell
127
+ python3 make_book.py --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_base 'https://xxxxx/v1'
128
+
129
+ # Or python3 is not in your PATH
130
+ python make_book.py --book_name 'animal_farm.epub' --openai_key sk-XXXXX --api_base 'https://xxxxx/v1'
131
+ ```
132
+
133
+ Microsoft Azure Endpoints
134
+ ```shell
135
+ python3 make_book.py --book_name 'animal_farm.epub' --openai_key XXXXX --api_base 'https://example-endpoint.openai.azure.com' --deployment_id 'deployment-name'
136
+
137
+ # Or python3 is not in your PATH
138
+ python make_book.py --book_name 'animal_farm.epub' --openai_key XXXXX --api_base 'https://example-endpoint.openai.azure.com' --deployment_id 'deployment-name'
139
+ ```
140
+
141
+ ## Docker
142
+
143
+ You can use [Docker](https://www.docker.com/) if you don't want to deal with setting up the environment.
144
+
145
+ ```shell
146
+ # Build image
147
+ docker build --tag bilingual_book_maker .
148
+
149
+ # Run container
150
+ # "$folder_path" represents the folder where your book file locates. Also, it is where the processed file will be stored.
151
+
152
+ # Windows PowerShell
153
+ $folder_path=your_folder_path # $folder_path="C:\Users\user\mybook\"
154
+ $book_name=your_book_name # $book_name="animal_farm.epub"
155
+ $openai_key=your_api_key # $openai_key="sk-xxx"
156
+ $language=your_language # see utils.py
157
+
158
+ docker run --rm --name bilingual_book_maker --mount type=bind,source=$folder_path,target='/app/test_books' bilingual_book_maker --book_name "/app/test_books/$book_name" --openai_key $openai_key --language $language
159
+
160
+ # Linux
161
+ export folder_path=${your_folder_path}
162
+ export book_name=${your_book_name}
163
+ export openai_key=${your_api_key}
164
+ export language=${your_language}
165
+
166
+ docker run --rm --name bilingual_book_maker --mount type=bind,source=${folder_path},target='/app/test_books' bilingual_book_maker --book_name "/app/test_books/${book_name}" --openai_key ${openai_key} --language "${language}"
167
+ ```
168
+
169
+ For example:
170
+
171
+ ```shell
172
+ # Linux
173
+ docker run --rm --name bilingual_book_maker --mount type=bind,source=/home/user/my_books,target='/app/test_books' bilingual_book_maker --book_name /app/test_books/animal_farm.epub --openai_key sk-XXX --test --test_num 1 --language zh-hant
174
+ ```
175
+
176
+ ## Notes
177
+
178
+ 1. API token from free trial has limit. If you want to speed up the process, consider paying for the service or use multiple OpenAI tokens
179
+ 2. PR is welcome
180
+
181
+ # Thanks
182
+
183
+ - @[yetone](https://github.com/yetone)
184
+
185
+ # Contribution
186
+
187
+ - Any issues or PRs are welcome.
188
+ - TODOs in the issue can also be selected.
189
+ - Please run `black make_book.py`[^black] before submitting the code.
190
+
191
+ # Others better
192
+
193
+ - 书译 iOS -> [AI 全书翻译工具](https://apps.apple.com/cn/app/%E4%B9%A6%E8%AF%91-ai-%E5%85%A8%E4%B9%A6%E7%BF%BB%E8%AF%91%E5%B7%A5%E5%85%B7/id6447665417)
194
+
195
+ ## Appreciation
196
+
197
+ Thank you, that's enough.
198
+
199
+ ![image](https://user-images.githubusercontent.com/15976103/222407199-1ed8930c-13a8-402b-9993-aaac8ee84744.png)
200
+
201
+ [^token]: https://platform.openai.com/account/api-keys
202
+ [^black]: https://github.com/psf/black