Local-Axiom-AI commited on
Commit
cc01ffb
·
verified ·
1 Parent(s): e3005fd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +263 -0
README.md ADDED
@@ -0,0 +1,263 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-0.5B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen2.5-0.5B
7
+ - lora
8
+ - transformers
9
+ license: mit
10
+ language:
11
+ - en
12
+ - es
13
+ ---
14
+
15
+ # Model Card for Model ID
16
+
17
+ This is a finetuned model based on the architecture of Qwen2.5-0.5B that is designed for english to spanish translations
18
+
19
+
20
+ ### Model Description
21
+
22
+ This model was finetuned using LoRA on ~100M EN to ES translations or about ~4B tokens
23
+
24
+ - **Developed by:** Local-Axiom-AI
25
+ - **Model type:** Translation
26
+ - **Language(s) (NLP):** English and Spanish
27
+ - **License:** MIT
28
+ - **Finetuned from model:** Qwen2.5-0.5B
29
+
30
+ ## Uses
31
+
32
+ It is designed for situations that require a lightweight translation of small paragraphs from english to spanish that has to happen in a private way or way that does not require internet
33
+
34
+ ### Out-of-Scope Use
35
+
36
+ Does very poorly with non English to spanish or Spanish to English translation or with very long translations
37
+
38
+ ## Bias, Risks, and Limitations
39
+
40
+ It does not work well when involving names
41
+
42
+ ### Recommendations
43
+
44
+ Translations of a few sentences or a single paragraph that are less than 512 tokens in length, because to reduce training time it was only trained with a max context of 512 tokens
45
+
46
+ ## How to Get Started with the Model
47
+ ```
48
+ #!/usr/bin/env python3
49
+ # -*- coding: utf-8 -*-
50
+
51
+ import argparse
52
+ import logging
53
+ import os
54
+ import sys
55
+ import torch
56
+ from flask import Flask, jsonify, request
57
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
58
+
59
+ logging.basicConfig(level=logging.INFO)
60
+ log = logging.getLogger(__name__)
61
+
62
+ app = Flask(__name__)
63
+
64
+ MODEL = None
65
+ TOKENIZER = None
66
+ DEVICE = None
67
+ STOP_ID = None
68
+
69
+ def load_model(model_dir: str, base_model_id: str, quantize: bool = False):
70
+ global MODEL, TOKENIZER, DEVICE, STOP_ID
71
+
72
+ DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
73
+ log.info(f"Using device: {DEVICE}")
74
+
75
+ if quantize:
76
+ qcfg = BitsAndBytesConfig(
77
+ load_in_4bit=True,
78
+ bnb_4bit_quant_type="nf4",
79
+ bnb_4bit_use_double_quant=True,
80
+ bnb_4bit_compute_dtype=torch.bfloat16,
81
+ )
82
+ MODEL = AutoModelForCausalLM.from_pretrained(
83
+ model_dir,
84
+ quantization_config=qcfg,
85
+ torch_dtype=torch.bfloat16,
86
+ trust_remote_code=True,
87
+ )
88
+ else:
89
+ MODEL = AutoModelForCausalLM.from_pretrained(
90
+ model_dir,
91
+ torch_dtype=torch.bfloat16,
92
+ trust_remote_code=True,
93
+ )
94
+
95
+ MODEL.eval().to(DEVICE)
96
+
97
+ TOKENIZER = AutoTokenizer.from_pretrained(
98
+ base_model_id,
99
+ trust_remote_code=True,
100
+ use_fast=False,
101
+ )
102
+
103
+ TOKENIZER.pad_token = TOKENIZER.eos_token
104
+
105
+ if "<STOP>" not in TOKENIZER.get_vocab():
106
+ log.info("Adding <STOP> token to tokenizer")
107
+ TOKENIZER.add_special_tokens(
108
+ {"additional_special_tokens": ["<STOP>"]}
109
+ )
110
+ MODEL.resize_token_embeddings(len(TOKENIZER))
111
+
112
+ STOP_ID = TOKENIZER.convert_tokens_to_ids("<STOP>")
113
+ log.info(f"<STOP> token id: {STOP_ID}")
114
+
115
+ log.info("Model & tokenizer loaded successfully")
116
+
117
+ def build_prompt(text: str, source: str, target: str) -> str:
118
+ if source == "en" and target == "es":
119
+ return f"Translate the following English text to Spanish:\n{text}\n\nTranslation:"
120
+ elif source == "es" and target == "en":
121
+ return f"Translate the following Spanish text to English:\n{text}\n\nTranslation:"
122
+ else:
123
+ raise ValueError("Unsupported translation direction")
124
+
125
+ @torch.inference_mode()
126
+ def translate(text: str, source: str, target: str) -> str:
127
+ prompt = build_prompt(text, source, target)
128
+
129
+ inputs = TOKENIZER(prompt, return_tensors="pt").to(DEVICE)
130
+ prompt_len = inputs["input_ids"].shape[1]
131
+
132
+ src_tokens = len(TOKENIZER.tokenize(text))
133
+ max_new = int(src_tokens * 1.3) + 6
134
+
135
+ output = MODEL.generate(
136
+ **inputs,
137
+ max_new_tokens=max_new,
138
+ do_sample=False,
139
+ temperature=0.0,
140
+ eos_token_id=STOP_ID,
141
+ pad_token_id=TOKENIZER.eos_token_id,
142
+ repetition_penalty=1.05,
143
+ )
144
+
145
+ decoded = TOKENIZER.decode(
146
+ output[0][prompt_len:], skip_special_tokens=False
147
+ )
148
+
149
+ return decoded.split("<STOP>")[0].strip()
150
+
151
+ @app.route("/translate", methods=["POST"])
152
+ def translate_endpoint():
153
+ data = request.get_json(silent=True)
154
+ if not data:
155
+ return jsonify({"error": "Invalid JSON"}), 400
156
+
157
+ text = data.get("text")
158
+ source = data.get("source")
159
+ target = data.get("target")
160
+
161
+ if not all([text, source, target]):
162
+ return jsonify({"error": "Missing fields"}), 400
163
+
164
+ if MODEL is None:
165
+ try:
166
+ load_model(
167
+ args.model_dir,
168
+ args.base_model_id,
169
+ args.quantize,
170
+ )
171
+ except Exception as e:
172
+ log.exception("Model load failed")
173
+ return jsonify({"error": str(e)}), 500
174
+
175
+ try:
176
+ result = translate(text, source, target)
177
+ return jsonify({"translation": result})
178
+ except Exception as e:
179
+ log.exception("Inference failed")
180
+ return jsonify({"error": str(e)}), 500
181
+
182
+ if __name__ == "__main__":
183
+ parser = argparse.ArgumentParser()
184
+ parser.add_argument("--model_dir", required=True)
185
+ parser.add_argument("--base_model_id", default="Qwen/Qwen2.5-0.5B")
186
+ parser.add_argument("--quantize", action="store_true")
187
+ parser.add_argument("--port", type=int, default=8011)
188
+ args = parser.parse_args()
189
+
190
+ if not os.path.isdir(args.model_dir):
191
+ log.error("Invalid model directory")
192
+ sys.exit(1)
193
+
194
+ log.info(f"Starting Translation API on port {args.port}")
195
+ app.run(host="0.0.0.0", port=args.port, threaded=True)
196
+ ```
197
+ ### Training Data
198
+
199
+ Here is an example from the taining data: For those who like contrasts, Para quien le gusten los contrastes
200
+
201
+ ### Training Procedure
202
+
203
+ Normal LoRA finetuning
204
+
205
+
206
+ #### Training Hyperparameters
207
+
208
+ - **Training regime:** Trained in FP16 with a R=8 and L_A=32
209
+
210
+ #### Speeds, Sizes, Times
211
+
212
+ Trained with a 4x RTX 4090s in about 80 hours
213
+
214
+ ## Evaluation
215
+
216
+ This model got a loss of 0.0476 on testing data
217
+
218
+ #### Testing Data
219
+
220
+ 15% of the training data was split off before training and used for testing
221
+
222
+ #### Metrics
223
+
224
+ It was tested with some basic and more challanging translations
225
+
226
+ ### Results
227
+
228
+ Quite good for a 0.5B model
229
+
230
+ #### Summary
231
+
232
+ A good AI for translation involving English and Spanish with minimal Vram usage
233
+
234
+ ## Environmental Impact
235
+
236
+ - **Hardware Type:** 4x RTX 4090
237
+ - **Hours used:** 80
238
+ - **Compute Region:** USA
239
+ - **Carbon Emitted:** 77.36 Lbs
240
+
241
+ ### Model Objective
242
+
243
+ Its objective is to give more precise translations than other translation methods
244
+
245
+ ### Compute Infrastructure
246
+
247
+ Trained with 4x RTX 4090 24gb
248
+
249
+ #### Hardware
250
+
251
+ 4x RTX 4090, 512GB Vram, AMD Epyc
252
+
253
+ #### Software
254
+
255
+ Python and Pytorch
256
+
257
+ ## Model Card Contact
258
+
259
+ local.axiom.ai@protonmail.com or local.axiom.ai@gmail.com
260
+
261
+ ### Framework versions
262
+
263
+ - PEFT 0.18.0