Text Generation
PEFT
Safetensors
Transformers
English
Spanish
qwen2
lora
conversational
text-generation-inference
Instructions to use Local-Axiom-AI/LinguaTale-EN-ES with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Local-Axiom-AI/LinguaTale-EN-ES with PEFT:
Task type is invalid.
- Transformers
How to use Local-Axiom-AI/LinguaTale-EN-ES with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Local-Axiom-AI/LinguaTale-EN-ES") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES") model = AutoModelForCausalLM.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Local-Axiom-AI/LinguaTale-EN-ES with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Local-Axiom-AI/LinguaTale-EN-ES" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Local-Axiom-AI/LinguaTale-EN-ES", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES
- SGLang
How to use Local-Axiom-AI/LinguaTale-EN-ES with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Local-Axiom-AI/LinguaTale-EN-ES", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Local-Axiom-AI/LinguaTale-EN-ES", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Local-Axiom-AI/LinguaTale-EN-ES with Docker Model Runner:
docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES
File size: 6,975 Bytes
cc01ffb a26225f cc01ffb 31a7241 cc01ffb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | ---
base_model: Qwen/Qwen2.5-0.5B
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen2.5-0.5B
- lora
- transformers
license: mit
language:
- en
- es
---
# Model Card for LinguaTale-EN-ES
This is a finetuned model based on the architecture of Qwen2.5-0.5B that is designed for english to spanish translations
### Model Description
This model was finetuned using LoRA on ~100M EN to ES translations or about ~4B tokens
- **Developed by:** Local-Axiom-AI
- **Model type:** Translation
- **Language(s) (NLP):** English and Spanish
- **License:** MIT
- **Finetuned from model:** Qwen2.5-0.5B
## Uses
It is designed for situations that require a lightweight translation of small paragraphs from English to Spanish or Spanish to English that has to happen in a private way or way that does not require internet
### Out-of-Scope Use
Does very poorly with non English to spanish or Spanish to English translation or with very long translations
## Bias, Risks, and Limitations
It does not work well when involving names
### Recommendations
Translations of a few sentences or a single paragraph that are less than 512 tokens in length, because to reduce training time it was only trained with a max context of 512 tokens
## How to Get Started with the Model
```
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import argparse
import logging
import os
import sys
import torch
from flask import Flask, jsonify, request
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)
app = Flask(__name__)
MODEL = None
TOKENIZER = None
DEVICE = None
STOP_ID = None
def load_model(model_dir: str, base_model_id: str, quantize: bool = False):
global MODEL, TOKENIZER, DEVICE, STOP_ID
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
log.info(f"Using device: {DEVICE}")
if quantize:
qcfg = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
MODEL = AutoModelForCausalLM.from_pretrained(
model_dir,
quantization_config=qcfg,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
else:
MODEL = AutoModelForCausalLM.from_pretrained(
model_dir,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
MODEL.eval().to(DEVICE)
TOKENIZER = AutoTokenizer.from_pretrained(
base_model_id,
trust_remote_code=True,
use_fast=False,
)
TOKENIZER.pad_token = TOKENIZER.eos_token
if "<STOP>" not in TOKENIZER.get_vocab():
log.info("Adding <STOP> token to tokenizer")
TOKENIZER.add_special_tokens(
{"additional_special_tokens": ["<STOP>"]}
)
MODEL.resize_token_embeddings(len(TOKENIZER))
STOP_ID = TOKENIZER.convert_tokens_to_ids("<STOP>")
log.info(f"<STOP> token id: {STOP_ID}")
log.info("Model & tokenizer loaded successfully")
def build_prompt(text: str, source: str, target: str) -> str:
if source == "en" and target == "es":
return f"Translate the following English text to Spanish:\n{text}\n\nTranslation:"
elif source == "es" and target == "en":
return f"Translate the following Spanish text to English:\n{text}\n\nTranslation:"
else:
raise ValueError("Unsupported translation direction")
@torch.inference_mode()
def translate(text: str, source: str, target: str) -> str:
prompt = build_prompt(text, source, target)
inputs = TOKENIZER(prompt, return_tensors="pt").to(DEVICE)
prompt_len = inputs["input_ids"].shape[1]
src_tokens = len(TOKENIZER.tokenize(text))
max_new = int(src_tokens * 1.3) + 6
output = MODEL.generate(
**inputs,
max_new_tokens=max_new,
do_sample=False,
temperature=0.0,
eos_token_id=STOP_ID,
pad_token_id=TOKENIZER.eos_token_id,
repetition_penalty=1.05,
)
decoded = TOKENIZER.decode(
output[0][prompt_len:], skip_special_tokens=False
)
return decoded.split("<STOP>")[0].strip()
@app.route("/translate", methods=["POST"])
def translate_endpoint():
data = request.get_json(silent=True)
if not data:
return jsonify({"error": "Invalid JSON"}), 400
text = data.get("text")
source = data.get("source")
target = data.get("target")
if not all([text, source, target]):
return jsonify({"error": "Missing fields"}), 400
if MODEL is None:
try:
load_model(
args.model_dir,
args.base_model_id,
args.quantize,
)
except Exception as e:
log.exception("Model load failed")
return jsonify({"error": str(e)}), 500
try:
result = translate(text, source, target)
return jsonify({"translation": result})
except Exception as e:
log.exception("Inference failed")
return jsonify({"error": str(e)}), 500
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model_dir", required=True)
parser.add_argument("--base_model_id", default="Qwen/Qwen2.5-0.5B")
parser.add_argument("--quantize", action="store_true")
parser.add_argument("--port", type=int, default=8011)
args = parser.parse_args()
if not os.path.isdir(args.model_dir):
log.error("Invalid model directory")
sys.exit(1)
log.info(f"Starting Translation API on port {args.port}")
app.run(host="0.0.0.0", port=args.port, threaded=True)
```
### Training Data
Here is an example from the taining data: For those who like contrasts, Para quien le gusten los contrastes
### Training Procedure
Normal LoRA finetuning
#### Training Hyperparameters
- **Training regime:** Trained in FP16 with a R=8 and L_A=32
#### Speeds, Sizes, Times
Trained with a 4x RTX 4090s in about 80 hours
## Evaluation
This model got a loss of 0.0476 on testing data
#### Testing Data
15% of the training data was split off before training and used for testing
#### Metrics
It was tested with some basic and more challanging translations
### Results
Quite good for a 0.5B model
#### Summary
A good AI for translation involving English and Spanish with minimal Vram usage
## Environmental Impact
- **Hardware Type:** 4x RTX 4090
- **Hours used:** 80
- **Compute Region:** USA
- **Carbon Emitted:** 77.36 Lbs
### Model Objective
Its objective is to give more precise translations than other translation methods
### Compute Infrastructure
Trained with 4x RTX 4090 24gb
#### Hardware
4x RTX 4090, 512GB Vram, AMD Epyc
#### Software
Python and Pytorch
## Model Card Contact
local.axiom.ai@protonmail.com or local.axiom.ai@gmail.com
### Framework versions
- PEFT 0.18.0 |