Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,129 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: gpl-3.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: gpl-3.0
|
| 3 |
+
datasets:
|
| 4 |
+
- Mxode/BiST
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
- zh
|
| 8 |
+
pipeline_tag: translation
|
| 9 |
+
library_name: transformers
|
| 10 |
+
---
|
| 11 |
+
# **NanoTranslator-L**
|
| 12 |
+
|
| 13 |
+
English | [简体中文](README_zh-CN.md)
|
| 14 |
+
|
| 15 |
+
## Introduction
|
| 16 |
+
|
| 17 |
+
This is the **large** model of the NanoTranslator, currently supported only in **English to Chinese**.
|
| 18 |
+
|
| 19 |
+
The ONNX version of the model is also available in the repository.
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
| Size | P. | Arch. | Act. | V. | H. | I. | L. | A.H. | K.H. | Tie |
|
| 23 |
+
| :--: | :-----: | :--: | :--: | :--: | :-----: | :---: | :------: | :----: | :----: | :--: |
|
| 24 |
+
| XL | 100 | LLaMA | SwiGLU | 16K | 768 | 4096 | 8 | 24 | 8 | True |
|
| 25 |
+
| L | 78 | LLaMA | GeGLU | 16K | 768 | 4096 | 6 | 24 | 8 | True |
|
| 26 |
+
| M2 | 22 | Qwen2 | GeGLU | 4K | 432 | 2304 | 6 | 24 | 8 | True |
|
| 27 |
+
| M | 22 | LLaMA | SwiGLU | 8K | 256 | 1408 | 16 | 16 | 4 | True |
|
| 28 |
+
| S | 9 | LLaMA | SwiGLU | 4K | 168 | 896 | 16 | 12 | 4 | True |
|
| 29 |
+
| XS | 2 | LLaMA | SwiGLU | 2K | 96 | 512 | 12 | 12 | 4 | True |
|
| 30 |
+
|
| 31 |
+
- **P.** - Parameters (in million)
|
| 32 |
+
- **V.** - vocab size
|
| 33 |
+
- **H.** - hidden size
|
| 34 |
+
- **I.** - intermediate size
|
| 35 |
+
- **L.** - num layers
|
| 36 |
+
- **A.H.** - num attention heads
|
| 37 |
+
- **K.H.** - num kv heads
|
| 38 |
+
- **Tie** - tie word embeddings
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
## How to use
|
| 43 |
+
|
| 44 |
+
Prompt format as follows:
|
| 45 |
+
|
| 46 |
+
```
|
| 47 |
+
<|im_start|> {English Text} <|endoftext|>
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
### Directly using transformers
|
| 51 |
+
|
| 52 |
+
```python
|
| 53 |
+
import torch
|
| 54 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 55 |
+
|
| 56 |
+
model_path = 'Mxode/NanoTranslator-L'
|
| 57 |
+
|
| 58 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 59 |
+
model = AutoModelForCausalLM.from_pretrained(model_path)
|
| 60 |
+
|
| 61 |
+
def translate(text: str, model, **kwargs):
|
| 62 |
+
generation_args = dict(
|
| 63 |
+
max_new_tokens = kwargs.pop("max_new_tokens", 512),
|
| 64 |
+
do_sample = kwargs.pop("do_sample", True),
|
| 65 |
+
temperature = kwargs.pop("temperature", 0.55),
|
| 66 |
+
top_p = kwargs.pop("top_p", 0.8),
|
| 67 |
+
top_k = kwargs.pop("top_k", 40),
|
| 68 |
+
**kwargs
|
| 69 |
+
)
|
| 70 |
+
|
| 71 |
+
prompt = "<|im_start|>" + text + "<|endoftext|>"
|
| 72 |
+
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
|
| 73 |
+
|
| 74 |
+
generated_ids = model.generate(model_inputs.input_ids, **generation_args)
|
| 75 |
+
generated_ids = [
|
| 76 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
| 77 |
+
]
|
| 78 |
+
|
| 79 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
| 80 |
+
return response
|
| 81 |
+
|
| 82 |
+
text = "I love to watch my favorite TV series."
|
| 83 |
+
|
| 84 |
+
response = translate(text, model, max_new_tokens=64, do_sample=False)
|
| 85 |
+
print(response)
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
### ONNX
|
| 90 |
+
|
| 91 |
+
It has been measured that reasoning with ONNX models will be **2-10 times faster** than reasoning directly with transformers models.
|
| 92 |
+
|
| 93 |
+
You should switch to [onnx branch](https://huggingface.co/Mxode/NanoTranslator-L/tree/onnx) manually and download to local.
|
| 94 |
+
|
| 95 |
+
reference docs:
|
| 96 |
+
|
| 97 |
+
- [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
|
| 98 |
+
- [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)
|
| 99 |
+
|
| 100 |
+
**Using ORTModelForCausalLM**
|
| 101 |
+
|
| 102 |
+
```python
|
| 103 |
+
from optimum.onnxruntime import ORTModelForCausalLM
|
| 104 |
+
from transformers import AutoTokenizer
|
| 105 |
+
|
| 106 |
+
model_path = "your/folder/to/onnx_model"
|
| 107 |
+
|
| 108 |
+
ort_model = ORTModelForCausalLM.from_pretrained(model_path)
|
| 109 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 110 |
+
|
| 111 |
+
text = "I love to watch my favorite TV series."
|
| 112 |
+
|
| 113 |
+
response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
|
| 114 |
+
print(response)
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
**Using pipeline**
|
| 118 |
+
|
| 119 |
+
```python
|
| 120 |
+
from optimum.pipelines import pipeline
|
| 121 |
+
|
| 122 |
+
model_path = "your/folder/to/onnx_model"
|
| 123 |
+
pipe = pipeline("text-generation", model=model_path, accelerator="ort")
|
| 124 |
+
|
| 125 |
+
text = "I love to watch my favorite TV series."
|
| 126 |
+
|
| 127 |
+
response = pipe(text, max_new_tokens=64, do_sample=False)
|
| 128 |
+
response
|
| 129 |
+
```
|