LinguaTale-EN-ES / README.md
Local-Axiom-AI's picture
Update README.md
a26225f verified
---
base_model: Qwen/Qwen2.5-0.5B
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen2.5-0.5B
- lora
- transformers
license: mit
language:
- en
- es
---
# Model Card for LinguaTale-EN-ES
This is a finetuned model based on the architecture of Qwen2.5-0.5B that is designed for english to spanish translations
### Model Description
This model was finetuned using LoRA on ~100M EN to ES translations or about ~4B tokens
- **Developed by:** Local-Axiom-AI
- **Model type:** Translation
- **Language(s) (NLP):** English and Spanish
- **License:** MIT
- **Finetuned from model:** Qwen2.5-0.5B
## Uses
It is designed for situations that require a lightweight translation of small paragraphs from English to Spanish or Spanish to English that has to happen in a private way or way that does not require internet
### Out-of-Scope Use
Does very poorly with non English to spanish or Spanish to English translation or with very long translations
## Bias, Risks, and Limitations
It does not work well when involving names
### Recommendations
Translations of a few sentences or a single paragraph that are less than 512 tokens in length, because to reduce training time it was only trained with a max context of 512 tokens
## How to Get Started with the Model
```
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import argparse
import logging
import os
import sys
import torch
from flask import Flask, jsonify, request
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)
app = Flask(__name__)
MODEL = None
TOKENIZER = None
DEVICE = None
STOP_ID = None
def load_model(model_dir: str, base_model_id: str, quantize: bool = False):
global MODEL, TOKENIZER, DEVICE, STOP_ID
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
log.info(f"Using device: {DEVICE}")
if quantize:
qcfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
MODEL = AutoModelForCausalLM.from_pretrained(
model_dir,
quantization_config=qcfg,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
else:
MODEL = AutoModelForCausalLM.from_pretrained(
model_dir,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
MODEL.eval().to(DEVICE)
TOKENIZER = AutoTokenizer.from_pretrained(
base_model_id,
trust_remote_code=True,
use_fast=False,
)
TOKENIZER.pad_token = TOKENIZER.eos_token
if "<STOP>" not in TOKENIZER.get_vocab():
log.info("Adding <STOP> token to tokenizer")
TOKENIZER.add_special_tokens(
{"additional_special_tokens": ["<STOP>"]}
)
MODEL.resize_token_embeddings(len(TOKENIZER))
STOP_ID = TOKENIZER.convert_tokens_to_ids("<STOP>")
log.info(f"<STOP> token id: {STOP_ID}")
log.info("Model & tokenizer loaded successfully")
def build_prompt(text: str, source: str, target: str) -> str:
if source == "en" and target == "es":
return f"Translate the following English text to Spanish:\n{text}\n\nTranslation:"
elif source == "es" and target == "en":
return f"Translate the following Spanish text to English:\n{text}\n\nTranslation:"
else:
raise ValueError("Unsupported translation direction")
@torch.inference_mode()
def translate(text: str, source: str, target: str) -> str:
prompt = build_prompt(text, source, target)
inputs = TOKENIZER(prompt, return_tensors="pt").to(DEVICE)
prompt_len = inputs["input_ids"].shape[1]
src_tokens = len(TOKENIZER.tokenize(text))
max_new = int(src_tokens * 1.3) + 6
output = MODEL.generate(
**inputs,
max_new_tokens=max_new,
do_sample=False,
temperature=0.0,
eos_token_id=STOP_ID,
pad_token_id=TOKENIZER.eos_token_id,
repetition_penalty=1.05,
)
decoded = TOKENIZER.decode(
output[0][prompt_len:], skip_special_tokens=False
)
return decoded.split("<STOP>")[0].strip()
@app.route("/translate", methods=["POST"])
def translate_endpoint():
data = request.get_json(silent=True)
if not data:
return jsonify({"error": "Invalid JSON"}), 400
text = data.get("text")
source = data.get("source")
target = data.get("target")
if not all([text, source, target]):
return jsonify({"error": "Missing fields"}), 400
if MODEL is None:
try:
load_model(
args.model_dir,
args.base_model_id,
args.quantize,
)
except Exception as e:
log.exception("Model load failed")
return jsonify({"error": str(e)}), 500
try:
result = translate(text, source, target)
return jsonify({"translation": result})
except Exception as e:
log.exception("Inference failed")
return jsonify({"error": str(e)}), 500
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model_dir", required=True)
parser.add_argument("--base_model_id", default="Qwen/Qwen2.5-0.5B")
parser.add_argument("--quantize", action="store_true")
parser.add_argument("--port", type=int, default=8011)
args = parser.parse_args()
if not os.path.isdir(args.model_dir):
log.error("Invalid model directory")
sys.exit(1)
log.info(f"Starting Translation API on port {args.port}")
app.run(host="0.0.0.0", port=args.port, threaded=True)
```
### Training Data
Here is an example from the taining data: For those who like contrasts, Para quien le gusten los contrastes
### Training Procedure
Normal LoRA finetuning
#### Training Hyperparameters
- **Training regime:** Trained in FP16 with a R=8 and L_A=32
#### Speeds, Sizes, Times
Trained with a 4x RTX 4090s in about 80 hours
## Evaluation
This model got a loss of 0.0476 on testing data
#### Testing Data
15% of the training data was split off before training and used for testing
#### Metrics
It was tested with some basic and more challanging translations
### Results
Quite good for a 0.5B model
#### Summary
A good AI for translation involving English and Spanish with minimal Vram usage
## Environmental Impact
- **Hardware Type:** 4x RTX 4090
- **Hours used:** 80
- **Compute Region:** USA
- **Carbon Emitted:** 77.36 Lbs
### Model Objective
Its objective is to give more precise translations than other translation methods
### Compute Infrastructure
Trained with 4x RTX 4090 24gb
#### Hardware
4x RTX 4090, 512GB Vram, AMD Epyc
#### Software
Python and Pytorch
## Model Card Contact
local.axiom.ai@protonmail.com or local.axiom.ai@gmail.com
### Framework versions
- PEFT 0.18.0