Instructions to use Local-Axiom-AI/LinguaTale-EN-ES with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use Local-Axiom-AI/LinguaTale-EN-ES with PEFT:
```
Task type is invalid.
```

How to use Local-Axiom-AI/LinguaTale-EN-ES with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Local-Axiom-AI/LinguaTale-EN-ES")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES")
model = AutoModelForCausalLM.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Local-Axiom-AI/LinguaTale-EN-ES with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Local-Axiom-AI/LinguaTale-EN-ES"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Local-Axiom-AI/LinguaTale-EN-ES",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES

SGLang

How to use Local-Axiom-AI/LinguaTale-EN-ES with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Local-Axiom-AI/LinguaTale-EN-ES",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Local-Axiom-AI/LinguaTale-EN-ES",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Local-Axiom-AI/LinguaTale-EN-ES with Docker Model Runner:
```
docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES
```

Model Card for LinguaTale-EN-ES

This is a finetuned model based on the architecture of Qwen2.5-0.5B that is designed for english to spanish translations

Model Description

This model was finetuned using LoRA on ~100M EN to ES translations or about ~4B tokens

Developed by: Local-Axiom-AI
Model type: Translation
Language(s) (NLP): English and Spanish
License: MIT
Finetuned from model: Qwen2.5-0.5B

Uses

It is designed for situations that require a lightweight translation of small paragraphs from English to Spanish or Spanish to English that has to happen in a private way or way that does not require internet

Out-of-Scope Use

Does very poorly with non English to spanish or Spanish to English translation or with very long translations

Bias, Risks, and Limitations

It does not work well when involving names

Recommendations

Translations of a few sentences or a single paragraph that are less than 512 tokens in length, because to reduce training time it was only trained with a max context of 512 tokens

How to Get Started with the Model

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import argparse
import logging
import os
import sys
import torch
from flask import Flask, jsonify, request
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

app = Flask(__name__)

MODEL = None
TOKENIZER = None
DEVICE = None
STOP_ID = None

def load_model(model_dir: str, base_model_id: str, quantize: bool = False):
    global MODEL, TOKENIZER, DEVICE, STOP_ID

    DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    log.info(f"Using device: {DEVICE}")

    if quantize:
        qcfg = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_use_double_quant=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
        )
        MODEL = AutoModelForCausalLM.from_pretrained(
            model_dir,
            quantization_config=qcfg,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
        )
    else:
        MODEL = AutoModelForCausalLM.from_pretrained(
            model_dir,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
        )

    MODEL.eval().to(DEVICE)

    TOKENIZER = AutoTokenizer.from_pretrained(
        base_model_id,
        trust_remote_code=True,
        use_fast=False,
    )

    TOKENIZER.pad_token = TOKENIZER.eos_token

    if "<STOP>" not in TOKENIZER.get_vocab():
        log.info("Adding <STOP> token to tokenizer")
        TOKENIZER.add_special_tokens(
            {"additional_special_tokens": ["<STOP>"]}
        )
        MODEL.resize_token_embeddings(len(TOKENIZER))

    STOP_ID = TOKENIZER.convert_tokens_to_ids("<STOP>")
    log.info(f"<STOP> token id: {STOP_ID}")

    log.info("Model & tokenizer loaded successfully")

def build_prompt(text: str, source: str, target: str) -> str:
    if source == "en" and target == "es":
        return f"Translate the following English text to Spanish:\n{text}\n\nTranslation:"
    elif source == "es" and target == "en":
        return f"Translate the following Spanish text to English:\n{text}\n\nTranslation:"
    else:
        raise ValueError("Unsupported translation direction")

@torch.inference_mode()
def translate(text: str, source: str, target: str) -> str:
    prompt = build_prompt(text, source, target)

    inputs = TOKENIZER(prompt, return_tensors="pt").to(DEVICE)
    prompt_len = inputs["input_ids"].shape[1]

    src_tokens = len(TOKENIZER.tokenize(text))
    max_new = int(src_tokens * 1.3) + 6

    output = MODEL.generate(
        **inputs,
        max_new_tokens=max_new,
        do_sample=False,
        temperature=0.0,
        eos_token_id=STOP_ID,
        pad_token_id=TOKENIZER.eos_token_id,
        repetition_penalty=1.05,
    )

    decoded = TOKENIZER.decode(
        output[0][prompt_len:], skip_special_tokens=False
    )

    return decoded.split("<STOP>")[0].strip()

@app.route("/translate", methods=["POST"])
def translate_endpoint():
    data = request.get_json(silent=True)
    if not data:
        return jsonify({"error": "Invalid JSON"}), 400

    text = data.get("text")
    source = data.get("source")
    target = data.get("target")

    if not all([text, source, target]):
        return jsonify({"error": "Missing fields"}), 400

    if MODEL is None:
        try:
            load_model(
                args.model_dir,
                args.base_model_id,
                args.quantize,
            )
        except Exception as e:
            log.exception("Model load failed")
            return jsonify({"error": str(e)}), 500

    try:
        result = translate(text, source, target)
        return jsonify({"translation": result})
    except Exception as e:
        log.exception("Inference failed")
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_dir", required=True)
    parser.add_argument("--base_model_id", default="Qwen/Qwen2.5-0.5B")
    parser.add_argument("--quantize", action="store_true")
    parser.add_argument("--port", type=int, default=8011)
    args = parser.parse_args()

    if not os.path.isdir(args.model_dir):
        log.error("Invalid model directory")
        sys.exit(1)

    log.info(f"Starting Translation API on port {args.port}")
    app.run(host="0.0.0.0", port=args.port, threaded=True)

Training Data

Here is an example from the taining data: For those who like contrasts, Para quien le gusten los contrastes

Training Procedure

Normal LoRA finetuning

Training Hyperparameters

Training regime: Trained in FP16 with a R=8 and L_A=32

Speeds, Sizes, Times

Trained with a 4x RTX 4090s in about 80 hours

Evaluation

This model got a loss of 0.0476 on testing data

Testing Data

15% of the training data was split off before training and used for testing

Metrics

It was tested with some basic and more challanging translations

Results

Quite good for a 0.5B model

Summary

A good AI for translation involving English and Spanish with minimal Vram usage

Environmental Impact

Hardware Type: 4x RTX 4090
Hours used: 80
Compute Region: USA
Carbon Emitted: 77.36 Lbs

Model Objective

Its objective is to give more precise translations than other translation methods

Compute Infrastructure

Trained with 4x RTX 4090 24gb

Hardware

4x RTX 4090, 512GB Vram, AMD Epyc

Software

Python and Pytorch

Model Card Contact

local.axiom.ai@protonmail.com or local.axiom.ai@gmail.com

Framework versions

PEFT 0.18.0

Downloads last month: -

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for Local-Axiom-AI/LinguaTale-EN-ES

Base model

Qwen/Qwen2.5-0.5B

Adapter

(393)

this model