File size: 6,975 Bytes
cc01ffb
 
 
 
 
 
 
 
 
 
 
 
 
 
a26225f
cc01ffb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31a7241
cc01ffb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
---
base_model: Qwen/Qwen2.5-0.5B
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen2.5-0.5B
- lora
- transformers
license: mit
language:
- en
- es
---

# Model Card for LinguaTale-EN-ES

This is a finetuned model based on the architecture of Qwen2.5-0.5B that is designed for english to spanish translations


### Model Description

This model was finetuned using LoRA on ~100M EN to ES translations or about ~4B tokens

- **Developed by:** Local-Axiom-AI
- **Model type:** Translation
- **Language(s) (NLP):** English and Spanish
- **License:** MIT
- **Finetuned from model:** Qwen2.5-0.5B

## Uses

It is designed for situations that require a lightweight translation of small paragraphs from English to Spanish or Spanish to English that has to happen in a private way or way that does not require internet

### Out-of-Scope Use

Does very poorly with non English to spanish or Spanish to English translation or with very long translations

## Bias, Risks, and Limitations

It does not work well when involving names

### Recommendations

Translations of a few sentences or a single paragraph that are less than 512 tokens in length, because to reduce training time it was only trained with a max context of 512 tokens

## How to Get Started with the Model
```
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import argparse
import logging
import os
import sys
import torch
from flask import Flask, jsonify, request
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

app = Flask(__name__)

MODEL = None
TOKENIZER = None
DEVICE = None
STOP_ID = None

def load_model(model_dir: str, base_model_id: str, quantize: bool = False):
    global MODEL, TOKENIZER, DEVICE, STOP_ID

    DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    log.info(f"Using device: {DEVICE}")

    if quantize:
        qcfg = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_use_double_quant=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
        )
        MODEL = AutoModelForCausalLM.from_pretrained(
            model_dir,
            quantization_config=qcfg,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
        )
    else:
        MODEL = AutoModelForCausalLM.from_pretrained(
            model_dir,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
        )

    MODEL.eval().to(DEVICE)

    TOKENIZER = AutoTokenizer.from_pretrained(
        base_model_id,
        trust_remote_code=True,
        use_fast=False,
    )

    TOKENIZER.pad_token = TOKENIZER.eos_token

    if "<STOP>" not in TOKENIZER.get_vocab():
        log.info("Adding <STOP> token to tokenizer")
        TOKENIZER.add_special_tokens(
            {"additional_special_tokens": ["<STOP>"]}
        )
        MODEL.resize_token_embeddings(len(TOKENIZER))

    STOP_ID = TOKENIZER.convert_tokens_to_ids("<STOP>")
    log.info(f"<STOP> token id: {STOP_ID}")

    log.info("Model & tokenizer loaded successfully")

def build_prompt(text: str, source: str, target: str) -> str:
    if source == "en" and target == "es":
        return f"Translate the following English text to Spanish:\n{text}\n\nTranslation:"
    elif source == "es" and target == "en":
        return f"Translate the following Spanish text to English:\n{text}\n\nTranslation:"
    else:
        raise ValueError("Unsupported translation direction")

@torch.inference_mode()
def translate(text: str, source: str, target: str) -> str:
    prompt = build_prompt(text, source, target)

    inputs = TOKENIZER(prompt, return_tensors="pt").to(DEVICE)
    prompt_len = inputs["input_ids"].shape[1]

    src_tokens = len(TOKENIZER.tokenize(text))
    max_new = int(src_tokens * 1.3) + 6

    output = MODEL.generate(
        **inputs,
        max_new_tokens=max_new,
        do_sample=False,
        temperature=0.0,
        eos_token_id=STOP_ID,
        pad_token_id=TOKENIZER.eos_token_id,
        repetition_penalty=1.05,
    )

    decoded = TOKENIZER.decode(
        output[0][prompt_len:], skip_special_tokens=False
    )

    return decoded.split("<STOP>")[0].strip()

@app.route("/translate", methods=["POST"])
def translate_endpoint():
    data = request.get_json(silent=True)
    if not data:
        return jsonify({"error": "Invalid JSON"}), 400

    text = data.get("text")
    source = data.get("source")
    target = data.get("target")

    if not all([text, source, target]):
        return jsonify({"error": "Missing fields"}), 400

    if MODEL is None:
        try:
            load_model(
                args.model_dir,
                args.base_model_id,
                args.quantize,
            )
        except Exception as e:
            log.exception("Model load failed")
            return jsonify({"error": str(e)}), 500

    try:
        result = translate(text, source, target)
        return jsonify({"translation": result})
    except Exception as e:
        log.exception("Inference failed")
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_dir", required=True)
    parser.add_argument("--base_model_id", default="Qwen/Qwen2.5-0.5B")
    parser.add_argument("--quantize", action="store_true")
    parser.add_argument("--port", type=int, default=8011)
    args = parser.parse_args()

    if not os.path.isdir(args.model_dir):
        log.error("Invalid model directory")
        sys.exit(1)

    log.info(f"Starting Translation API on port {args.port}")
    app.run(host="0.0.0.0", port=args.port, threaded=True)
```
### Training Data

Here is an example from the taining data: For those who like contrasts,  Para quien le gusten los contrastes

### Training Procedure

Normal LoRA finetuning


#### Training Hyperparameters

- **Training regime:** Trained in FP16 with a R=8 and L_A=32

#### Speeds, Sizes, Times

Trained with a 4x RTX 4090s in about 80 hours

## Evaluation

This model got a loss of 0.0476 on testing data

#### Testing Data

15% of the training data was split off before training and used for testing

#### Metrics

It was tested with some basic and more challanging translations

### Results

Quite good for a 0.5B model

#### Summary

A good AI for translation involving English and Spanish with minimal Vram usage

## Environmental Impact

- **Hardware Type:** 4x RTX 4090
- **Hours used:** 80
- **Compute Region:** USA
- **Carbon Emitted:** 77.36 Lbs

### Model Objective

Its objective is to give more precise translations than other translation methods

### Compute Infrastructure

Trained with 4x RTX 4090 24gb

#### Hardware

4x RTX 4090, 512GB Vram, AMD Epyc

#### Software

Python and Pytorch

## Model Card Contact

local.axiom.ai@protonmail.com or local.axiom.ai@gmail.com

### Framework versions

- PEFT 0.18.0