Instructions to use Local-Axiom-AI/LinguaTale-EN-ES with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use Local-Axiom-AI/LinguaTale-EN-ES with PEFT:
```
Task type is invalid.
```

How to use Local-Axiom-AI/LinguaTale-EN-ES with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Local-Axiom-AI/LinguaTale-EN-ES")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES")
model = AutoModelForCausalLM.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Local-Axiom-AI/LinguaTale-EN-ES with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Local-Axiom-AI/LinguaTale-EN-ES"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Local-Axiom-AI/LinguaTale-EN-ES",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES

SGLang

How to use Local-Axiom-AI/LinguaTale-EN-ES with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Local-Axiom-AI/LinguaTale-EN-ES",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Local-Axiom-AI/LinguaTale-EN-ES",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Local-Axiom-AI/LinguaTale-EN-ES with Docker Model Runner:
```
docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES
```

LinguaTale-EN-ES

File size: 6,975 Bytes

---
base_model: Qwen/Qwen2.5-0.5B
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen2.5-0.5B
- lora
- transformers
license: mit
language:
- en
- es
---

# Model Card for LinguaTale-EN-ES

This is a finetuned model based on the architecture of Qwen2.5-0.5B that is designed for english to spanish translations


### Model Description

This model was finetuned using LoRA on ~100M EN to ES translations or about ~4B tokens

- **Developed by:** Local-Axiom-AI
- **Model type:** Translation
- **Language(s) (NLP):** English and Spanish
- **License:** MIT
- **Finetuned from model:** Qwen2.5-0.5B

## Uses

It is designed for situations that require a lightweight translation of small paragraphs from English to Spanish or Spanish to English that has to happen in a private way or way that does not require internet

### Out-of-Scope Use

Does very poorly with non English to spanish or Spanish to English translation or with very long translations

## Bias, Risks, and Limitations

It does not work well when involving names

### Recommendations

Translations of a few sentences or a single paragraph that are less than 512 tokens in length, because to reduce training time it was only trained with a max context of 512 tokens

## How to Get Started with the Model
```
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import argparse
import logging
import os
import sys
import torch
from flask import Flask, jsonify, request
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

app = Flask(__name__)

MODEL = None
TOKENIZER = None
DEVICE = None
STOP_ID = None

def load_model(model_dir: str, base_model_id: str, quantize: bool = False):
    global MODEL, TOKENIZER, DEVICE, STOP_ID

    DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    log.info(f"Using device: {DEVICE}")

    if quantize:
        qcfg = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_use_double_quant=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
        )
        MODEL = AutoModelForCausalLM.from_pretrained(
            model_dir,
            quantization_config=qcfg,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
        )
    else:
        MODEL = AutoModelForCausalLM.from_pretrained(
            model_dir,
            torch_dtype=torch.bfloat16,
            trust_remote_code=True,
        )

    MODEL.eval().to(DEVICE)

    TOKENIZER = AutoTokenizer.from_pretrained(
        base_model_id,
        trust_remote_code=True,
        use_fast=False,
    )

    TOKENIZER.pad_token = TOKENIZER.eos_token

    if "<STOP>" not in TOKENIZER.get_vocab():
        log.info("Adding <STOP> token to tokenizer")
        TOKENIZER.add_special_tokens(
            {"additional_special_tokens": ["<STOP>"]}
        )
        MODEL.resize_token_embeddings(len(TOKENIZER))

    STOP_ID = TOKENIZER.convert_tokens_to_ids("<STOP>")
    log.info(f"<STOP> token id: {STOP_ID}")

    log.info("Model & tokenizer loaded successfully")

def build_prompt(text: str, source: str, target: str) -> str:
    if source == "en" and target == "es":
        return f"Translate the following English text to Spanish:\n{text}\n\nTranslation:"
    elif source == "es" and target == "en":
        return f"Translate the following Spanish text to English:\n{text}\n\nTranslation:"
    else:
        raise ValueError("Unsupported translation direction")

@torch.inference_mode()
def translate(text: str, source: str, target: str) -> str:
    prompt = build_prompt(text, source, target)

    inputs = TOKENIZER(prompt, return_tensors="pt").to(DEVICE)
    prompt_len = inputs["input_ids"].shape[1]

    src_tokens = len(TOKENIZER.tokenize(text))
    max_new = int(src_tokens * 1.3) + 6

    output = MODEL.generate(
        **inputs,
        max_new_tokens=max_new,
        do_sample=False,
        temperature=0.0,
        eos_token_id=STOP_ID,
        pad_token_id=TOKENIZER.eos_token_id,
        repetition_penalty=1.05,
    )

    decoded = TOKENIZER.decode(
        output[0][prompt_len:], skip_special_tokens=False
    )

    return decoded.split("<STOP>")[0].strip()

@app.route("/translate", methods=["POST"])
def translate_endpoint():
    data = request.get_json(silent=True)
    if not data:
        return jsonify({"error": "Invalid JSON"}), 400

    text = data.get("text")
    source = data.get("source")
    target = data.get("target")

    if not all([text, source, target]):
        return jsonify({"error": "Missing fields"}), 400

    if MODEL is None:
        try:
            load_model(
                args.model_dir,
                args.base_model_id,
                args.quantize,
            )
        except Exception as e:
            log.exception("Model load failed")
            return jsonify({"error": str(e)}), 500

    try:
        result = translate(text, source, target)
        return jsonify({"translation": result})
    except Exception as e:
        log.exception("Inference failed")
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_dir", required=True)
    parser.add_argument("--base_model_id", default="Qwen/Qwen2.5-0.5B")
    parser.add_argument("--quantize", action="store_true")
    parser.add_argument("--port", type=int, default=8011)
    args = parser.parse_args()

    if not os.path.isdir(args.model_dir):
        log.error("Invalid model directory")
        sys.exit(1)

    log.info(f"Starting Translation API on port {args.port}")
    app.run(host="0.0.0.0", port=args.port, threaded=True)
```
### Training Data

Here is an example from the taining data: For those who like contrasts,  Para quien le gusten los contrastes

### Training Procedure

Normal LoRA finetuning


#### Training Hyperparameters

- **Training regime:** Trained in FP16 with a R=8 and L_A=32

#### Speeds, Sizes, Times

Trained with a 4x RTX 4090s in about 80 hours

## Evaluation

This model got a loss of 0.0476 on testing data

#### Testing Data

15% of the training data was split off before training and used for testing

#### Metrics

It was tested with some basic and more challanging translations

### Results

Quite good for a 0.5B model

#### Summary

A good AI for translation involving English and Spanish with minimal Vram usage

## Environmental Impact

- **Hardware Type:** 4x RTX 4090
- **Hours used:** 80
- **Compute Region:** USA
- **Carbon Emitted:** 77.36 Lbs

### Model Objective

Its objective is to give more precise translations than other translation methods

### Compute Infrastructure

Trained with 4x RTX 4090 24gb

#### Hardware

4x RTX 4090, 512GB Vram, AMD Epyc

#### Software

Python and Pytorch

## Model Card Contact

local.axiom.ai@protonmail.com or local.axiom.ai@gmail.com

### Framework versions

- PEFT 0.18.0