Update README.md
Browse files
README.md
CHANGED
|
@@ -36,8 +36,18 @@ If you want to run it on your own local computer, you will need approximately 8.
|
|
| 36 |
必要なライブラリのインストール
|
| 37 |
Installation of required libraries
|
| 38 |
```
|
| 39 |
-
|
|
|
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
```
|
| 42 |
|
| 43 |
サンプルスクリプト
|
|
@@ -61,14 +71,13 @@ def trans(my_str):
|
|
| 61 |
|
| 62 |
# Translation
|
| 63 |
generated_ids = model.generate(input_ids=input_ids,
|
| 64 |
-
num_beams=
|
| 65 |
use_cache=True,
|
| 66 |
prompt_lookup_num_tokens=10
|
| 67 |
)
|
| 68 |
full_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 69 |
return full_outputs[0].split("### Answer:\n")[-1].strip()
|
| 70 |
|
| 71 |
-
|
| 72 |
ret = trans("""
|
| 73 |
### Instructions:
|
| 74 |
Translate Japanese to English.
|
|
@@ -90,16 +99,20 @@ There are two types of instructions: "Translate Japanese to English." and "Trans
|
|
| 90 |
実験的な試みとして、インフォーマルな場面を想定した翻訳を行う際にsubculture文脈指定ができるようになっています。
|
| 91 |
As an experiment, subculture context can be specified when translating for informal situations.
|
| 92 |
|
|
|
|
| 93 |
Translate English to Japanese within the context of subculture.
|
| 94 |
|
| 95 |
|
| 96 |
## 留意事項 Attention
|
| 97 |
-
**Do not save this adapter merged with the base model**, as there exists a bug that reduces performance when saving this adapter merged with the model.
|
| 98 |
|
| 99 |
-
|
|
|
|
| 100 |
|
|
|
|
|
|
|
| 101 |
|
| 102 |
### 利用規約 Terms of Use
|
|
|
|
| 103 |
基本的にはgemmaと同じライセンスです
|
| 104 |
Basically the same license as gemma.
|
| 105 |
|
|
@@ -112,13 +125,14 @@ Our previous model, ALMA-7B-Ja-V2, has over 150K downloads, but we have no idea
|
|
| 112 |
そのため、使用した後は[Googleフォームに感想や今後期待する方向性、気が付いた誤訳の例などを記入](https://forms.gle/Ycr9nWumvGamiNma9)してください。
|
| 113 |
So, after you use it, please [fill out the Google form below with your impressions, future directions you expect us to take, and examples of mistranslations you have noticed](https://forms.gle/Ycr9nWumvGamiNma9).
|
| 114 |
|
| 115 |
-
|
| 116 |
-
We do not collect personal information, so please feel free to fill out the form!
|
| 117 |
|
| 118 |
どんなご意見でも感謝します!
|
| 119 |
Any feedback would be appreciated!
|
| 120 |
|
| 121 |
### 謝辞 Acknowledgment
|
|
|
|
| 122 |
Original Base Model
|
| 123 |
google/gemma-7b
|
| 124 |
https://huggingface.co/google/gemma-7b
|
|
@@ -131,7 +145,6 @@ QLoRA Adapter
|
|
| 131 |
webbigdata/C3TR-Adapter
|
| 132 |
https://huggingface.co/webbigdata/C3TR-Adapter
|
| 133 |
|
| 134 |
-
|
| 135 |
This adapter was trained with Unsloth.
|
| 136 |
https://github.com/unslothai/unsloth
|
| 137 |
|
|
|
|
| 36 |
必要なライブラリのインストール
|
| 37 |
Installation of required libraries
|
| 38 |
```
|
| 39 |
+
# first install pytorch. check official documents.
|
| 40 |
+
# https://pytorch.org/get-started/locally/#start-locally
|
| 41 |
|
| 42 |
+
# example for linux user.
|
| 43 |
+
# pip3 install torch torchvision torchaudio
|
| 44 |
+
|
| 45 |
+
# example for windows user.
|
| 46 |
+
# pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
|
| 47 |
+
|
| 48 |
+
pip install transformers==4.38.2
|
| 49 |
+
pip install peft==0.9.0
|
| 50 |
+
pip install bitsandbytes==0.42.0
|
| 51 |
```
|
| 52 |
|
| 53 |
サンプルスクリプト
|
|
|
|
| 71 |
|
| 72 |
# Translation
|
| 73 |
generated_ids = model.generate(input_ids=input_ids,
|
| 74 |
+
num_beams=1, max_new_tokens=800,
|
| 75 |
use_cache=True,
|
| 76 |
prompt_lookup_num_tokens=10
|
| 77 |
)
|
| 78 |
full_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
|
| 79 |
return full_outputs[0].split("### Answer:\n")[-1].strip()
|
| 80 |
|
|
|
|
| 81 |
ret = trans("""
|
| 82 |
### Instructions:
|
| 83 |
Translate Japanese to English.
|
|
|
|
| 99 |
実験的な試みとして、インフォーマルな場面を想定した翻訳を行う際にsubculture文脈指定ができるようになっています。
|
| 100 |
As an experiment, subculture context can be specified when translating for informal situations.
|
| 101 |
|
| 102 |
+
eg:
|
| 103 |
Translate English to Japanese within the context of subculture.
|
| 104 |
|
| 105 |
|
| 106 |
## 留意事項 Attention
|
|
|
|
| 107 |
|
| 108 |
+
このアダプターをモデルとマージして保存すると性能が下がってしまう不具合が存在するため、**ベースモデル(gemma-7b-bnb-4bit)とアダプターをマージして保存しないでください**
|
| 109 |
+
**Do not save this adapter merged with the base model(gemma-7b-bnb-4bit)**, as there exists a bug that reduces performance when saving this adapter merged with the model.
|
| 110 |
|
| 111 |
+
どうしてもマージしたい場合は必ずPerplexityではなく、翻訳ベンチマークで性能を確認してから使うようにしてください
|
| 112 |
+
If you must merge, be sure to use a translation benchmark to check performance, not Perplexity!
|
| 113 |
|
| 114 |
### 利用規約 Terms of Use
|
| 115 |
+
|
| 116 |
基本的にはgemmaと同じライセンスです
|
| 117 |
Basically the same license as gemma.
|
| 118 |
|
|
|
|
| 125 |
そのため、使用した後は[Googleフォームに感想や今後期待する方向性、気が付いた誤訳の例などを記入](https://forms.gle/Ycr9nWumvGamiNma9)してください。
|
| 126 |
So, after you use it, please [fill out the Google form below with your impressions, future directions you expect us to take, and examples of mistranslations you have noticed](https://forms.gle/Ycr9nWumvGamiNma9).
|
| 127 |
|
| 128 |
+
個人情報やメールアドレスは収集しないので、気軽にご記入をお願いします
|
| 129 |
+
We do not collect personal information or email address, so please feel free to fill out the form!
|
| 130 |
|
| 131 |
どんなご意見でも感謝します!
|
| 132 |
Any feedback would be appreciated!
|
| 133 |
|
| 134 |
### 謝辞 Acknowledgment
|
| 135 |
+
|
| 136 |
Original Base Model
|
| 137 |
google/gemma-7b
|
| 138 |
https://huggingface.co/google/gemma-7b
|
|
|
|
| 145 |
webbigdata/C3TR-Adapter
|
| 146 |
https://huggingface.co/webbigdata/C3TR-Adapter
|
| 147 |
|
|
|
|
| 148 |
This adapter was trained with Unsloth.
|
| 149 |
https://github.com/unslothai/unsloth
|
| 150 |
|