--- base_model: Qwen/Qwen2.5-0.5B library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:Qwen/Qwen2.5-0.5B - lora - transformers license: mit language: - en - es --- # Model Card for LinguaTale-EN-ES This is a finetuned model based on the architecture of Qwen2.5-0.5B that is designed for english to spanish translations ### Model Description This model was finetuned using LoRA on ~100M EN to ES translations or about ~4B tokens - **Developed by:** Local-Axiom-AI - **Model type:** Translation - **Language(s) (NLP):** English and Spanish - **License:** MIT - **Finetuned from model:** Qwen2.5-0.5B ## Uses It is designed for situations that require a lightweight translation of small paragraphs from English to Spanish or Spanish to English that has to happen in a private way or way that does not require internet ### Out-of-Scope Use Does very poorly with non English to spanish or Spanish to English translation or with very long translations ## Bias, Risks, and Limitations It does not work well when involving names ### Recommendations Translations of a few sentences or a single paragraph that are less than 512 tokens in length, because to reduce training time it was only trained with a max context of 512 tokens ## How to Get Started with the Model ``` #!/usr/bin/env python3 # -*- coding: utf-8 -*- import argparse import logging import os import sys import torch from flask import Flask, jsonify, request from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig logging.basicConfig(level=logging.INFO) log = logging.getLogger(__name__) app = Flask(__name__) MODEL = None TOKENIZER = None DEVICE = None STOP_ID = None def load_model(model_dir: str, base_model_id: str, quantize: bool = False): global MODEL, TOKENIZER, DEVICE, STOP_ID DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") log.info(f"Using device: {DEVICE}") if quantize: qcfg = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16, ) MODEL = AutoModelForCausalLM.from_pretrained( model_dir, quantization_config=qcfg, torch_dtype=torch.bfloat16, trust_remote_code=True, ) else: MODEL = AutoModelForCausalLM.from_pretrained( model_dir, torch_dtype=torch.bfloat16, trust_remote_code=True, ) MODEL.eval().to(DEVICE) TOKENIZER = AutoTokenizer.from_pretrained( base_model_id, trust_remote_code=True, use_fast=False, ) TOKENIZER.pad_token = TOKENIZER.eos_token if "" not in TOKENIZER.get_vocab(): log.info("Adding token to tokenizer") TOKENIZER.add_special_tokens( {"additional_special_tokens": [""]} ) MODEL.resize_token_embeddings(len(TOKENIZER)) STOP_ID = TOKENIZER.convert_tokens_to_ids("") log.info(f" token id: {STOP_ID}") log.info("Model & tokenizer loaded successfully") def build_prompt(text: str, source: str, target: str) -> str: if source == "en" and target == "es": return f"Translate the following English text to Spanish:\n{text}\n\nTranslation:" elif source == "es" and target == "en": return f"Translate the following Spanish text to English:\n{text}\n\nTranslation:" else: raise ValueError("Unsupported translation direction") @torch.inference_mode() def translate(text: str, source: str, target: str) -> str: prompt = build_prompt(text, source, target) inputs = TOKENIZER(prompt, return_tensors="pt").to(DEVICE) prompt_len = inputs["input_ids"].shape[1] src_tokens = len(TOKENIZER.tokenize(text)) max_new = int(src_tokens * 1.3) + 6 output = MODEL.generate( **inputs, max_new_tokens=max_new, do_sample=False, temperature=0.0, eos_token_id=STOP_ID, pad_token_id=TOKENIZER.eos_token_id, repetition_penalty=1.05, ) decoded = TOKENIZER.decode( output[0][prompt_len:], skip_special_tokens=False ) return decoded.split("")[0].strip() @app.route("/translate", methods=["POST"]) def translate_endpoint(): data = request.get_json(silent=True) if not data: return jsonify({"error": "Invalid JSON"}), 400 text = data.get("text") source = data.get("source") target = data.get("target") if not all([text, source, target]): return jsonify({"error": "Missing fields"}), 400 if MODEL is None: try: load_model( args.model_dir, args.base_model_id, args.quantize, ) except Exception as e: log.exception("Model load failed") return jsonify({"error": str(e)}), 500 try: result = translate(text, source, target) return jsonify({"translation": result}) except Exception as e: log.exception("Inference failed") return jsonify({"error": str(e)}), 500 if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--model_dir", required=True) parser.add_argument("--base_model_id", default="Qwen/Qwen2.5-0.5B") parser.add_argument("--quantize", action="store_true") parser.add_argument("--port", type=int, default=8011) args = parser.parse_args() if not os.path.isdir(args.model_dir): log.error("Invalid model directory") sys.exit(1) log.info(f"Starting Translation API on port {args.port}") app.run(host="0.0.0.0", port=args.port, threaded=True) ``` ### Training Data Here is an example from the taining data: For those who like contrasts, Para quien le gusten los contrastes ### Training Procedure Normal LoRA finetuning #### Training Hyperparameters - **Training regime:** Trained in FP16 with a R=8 and L_A=32 #### Speeds, Sizes, Times Trained with a 4x RTX 4090s in about 80 hours ## Evaluation This model got a loss of 0.0476 on testing data #### Testing Data 15% of the training data was split off before training and used for testing #### Metrics It was tested with some basic and more challanging translations ### Results Quite good for a 0.5B model #### Summary A good AI for translation involving English and Spanish with minimal Vram usage ## Environmental Impact - **Hardware Type:** 4x RTX 4090 - **Hours used:** 80 - **Compute Region:** USA - **Carbon Emitted:** 77.36 Lbs ### Model Objective Its objective is to give more precise translations than other translation methods ### Compute Infrastructure Trained with 4x RTX 4090 24gb #### Hardware 4x RTX 4090, 512GB Vram, AMD Epyc #### Software Python and Pytorch ## Model Card Contact local.axiom.ai@protonmail.com or local.axiom.ai@gmail.com ### Framework versions - PEFT 0.18.0