Instructions to use Local-Axiom-AI/LinguaTale-EN-ES with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use Local-Axiom-AI/LinguaTale-EN-ES with PEFT:
```
Task type is invalid.
```

How to use Local-Axiom-AI/LinguaTale-EN-ES with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Local-Axiom-AI/LinguaTale-EN-ES")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES")
model = AutoModelForCausalLM.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Local-Axiom-AI/LinguaTale-EN-ES with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Local-Axiom-AI/LinguaTale-EN-ES"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Local-Axiom-AI/LinguaTale-EN-ES",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES

SGLang

How to use Local-Axiom-AI/LinguaTale-EN-ES with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Local-Axiom-AI/LinguaTale-EN-ES",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Local-Axiom-AI/LinguaTale-EN-ES",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Local-Axiom-AI/LinguaTale-EN-ES with Docker Model Runner:
```
docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES
```

Local-Axiom-AI commited on Feb 25

Commit

cc01ffb

verified ·

1 Parent(s): e3005fd

Create README.md

Browse files

Files changed (1) hide show

README.md +263 -0

README.md ADDED Viewed

	@@ -0,0 +1,263 @@

+---
+base_model: Qwen/Qwen2.5-0.5B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen2.5-0.5B
+- lora
+- transformers
+license: mit
+language:
+- en
+- es
+---
+# Model Card for Model ID
+This is a finetuned model based on the architecture of Qwen2.5-0.5B that is designed for english to spanish translations
+### Model Description
+This model was finetuned using LoRA on ~100M EN to ES translations or about ~4B tokens
+- **Developed by:** Local-Axiom-AI
+- **Model type:** Translation
+- **Language(s) (NLP):** English and Spanish
+- **License:** MIT
+- **Finetuned from model:** Qwen2.5-0.5B
+## Uses
+It is designed for situations that require a lightweight translation of small paragraphs from english to spanish that has to happen in a private way or way that does not require internet
+### Out-of-Scope Use
+Does very poorly with non English to spanish or Spanish to English translation or with very long translations
+## Bias, Risks, and Limitations
+It does not work well when involving names
+### Recommendations
+Translations of a few sentences or a single paragraph that are less than 512 tokens in length, because to reduce training time it was only trained with a max context of 512 tokens
+## How to Get Started with the Model
+```
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+import argparse
+import logging
+import os
+import sys
+import torch
+from flask import Flask, jsonify, request
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+logging.basicConfig(level=logging.INFO)
+log = logging.getLogger(__name__)
+app = Flask(__name__)
+MODEL = None
+TOKENIZER = None
+DEVICE = None
+STOP_ID = None
+def load_model(model_dir: str, base_model_id: str, quantize: bool = False):
+    global MODEL, TOKENIZER, DEVICE, STOP_ID
+    DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    log.info(f"Using device: {DEVICE}")
+    if quantize:
+        qcfg = BitsAndBytesConfig(
+            load_in_4bit=True,
+            bnb_4bit_quant_type="nf4",
+            bnb_4bit_use_double_quant=True,
+            bnb_4bit_compute_dtype=torch.bfloat16,
+        )
+        MODEL = AutoModelForCausalLM.from_pretrained(
+            model_dir,
+            quantization_config=qcfg,
+            torch_dtype=torch.bfloat16,
+            trust_remote_code=True,
+        )
+    else:
+        MODEL = AutoModelForCausalLM.from_pretrained(
+            model_dir,
+            torch_dtype=torch.bfloat16,
+            trust_remote_code=True,
+        )
+    MODEL.eval().to(DEVICE)
+    TOKENIZER = AutoTokenizer.from_pretrained(
+        base_model_id,
+        trust_remote_code=True,
+        use_fast=False,
+    )
+    TOKENIZER.pad_token = TOKENIZER.eos_token
+    if "<STOP>" not in TOKENIZER.get_vocab():
+        log.info("Adding <STOP> token to tokenizer")
+        TOKENIZER.add_special_tokens(
+            {"additional_special_tokens": ["<STOP>"]}
+        )
+        MODEL.resize_token_embeddings(len(TOKENIZER))
+    STOP_ID = TOKENIZER.convert_tokens_to_ids("<STOP>")
+    log.info(f"<STOP> token id: {STOP_ID}")
+    log.info("Model & tokenizer loaded successfully")
+def build_prompt(text: str, source: str, target: str) -> str:
+    if source == "en" and target == "es":
+        return f"Translate the following English text to Spanish:\n{text}\n\nTranslation:"
+    elif source == "es" and target == "en":
+        return f"Translate the following Spanish text to English:\n{text}\n\nTranslation:"
+    else:
+        raise ValueError("Unsupported translation direction")
+@torch.inference_mode()
+def translate(text: str, source: str, target: str) -> str:
+    prompt = build_prompt(text, source, target)
+    inputs = TOKENIZER(prompt, return_tensors="pt").to(DEVICE)
+    prompt_len = inputs["input_ids"].shape[1]
+    src_tokens = len(TOKENIZER.tokenize(text))
+    max_new = int(src_tokens * 1.3) + 6
+    output = MODEL.generate(
+        **inputs,
+        max_new_tokens=max_new,
+        do_sample=False,
+        temperature=0.0,
+        eos_token_id=STOP_ID,
+        pad_token_id=TOKENIZER.eos_token_id,
+        repetition_penalty=1.05,
+    )
+    decoded = TOKENIZER.decode(
+        output[0][prompt_len:], skip_special_tokens=False
+    )
+    return decoded.split("<STOP>")[0].strip()
+@app.route("/translate", methods=["POST"])
+def translate_endpoint():
+    data = request.get_json(silent=True)
+    if not data:
+        return jsonify({"error": "Invalid JSON"}), 400
+    text = data.get("text")
+    source = data.get("source")
+    target = data.get("target")
+    if not all([text, source, target]):
+        return jsonify({"error": "Missing fields"}), 400
+    if MODEL is None:
+        try:
+            load_model(
+                args.model_dir,
+                args.base_model_id,
+                args.quantize,
+            )
+        except Exception as e:
+            log.exception("Model load failed")
+            return jsonify({"error": str(e)}), 500
+    try:
+        result = translate(text, source, target)
+        return jsonify({"translation": result})
+    except Exception as e:
+        log.exception("Inference failed")
+        return jsonify({"error": str(e)}), 500
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model_dir", required=True)
+    parser.add_argument("--base_model_id", default="Qwen/Qwen2.5-0.5B")
+    parser.add_argument("--quantize", action="store_true")
+    parser.add_argument("--port", type=int, default=8011)
+    args = parser.parse_args()
+    if not os.path.isdir(args.model_dir):
+        log.error("Invalid model directory")
+        sys.exit(1)
+    log.info(f"Starting Translation API on port {args.port}")
+    app.run(host="0.0.0.0", port=args.port, threaded=True)
+```
+### Training Data
+Here is an example from the taining data: For those who like contrasts,  Para quien le gusten los contrastes
+### Training Procedure
+Normal LoRA finetuning
+#### Training Hyperparameters
+- **Training regime:** Trained in FP16 with a R=8 and L_A=32
+#### Speeds, Sizes, Times
+Trained with a 4x RTX 4090s in about 80 hours
+## Evaluation
+This model got a loss of 0.0476 on testing data
+#### Testing Data
+15% of the training data was split off before training and used for testing
+#### Metrics
+It was tested with some basic and more challanging translations
+### Results
+Quite good for a 0.5B model
+#### Summary
+A good AI for translation involving English and Spanish with minimal Vram usage
+## Environmental Impact
+- **Hardware Type:** 4x RTX 4090
+- **Hours used:** 80
+- **Compute Region:** USA
+- **Carbon Emitted:** 77.36 Lbs
+### Model Objective
+Its objective is to give more precise translations than other translation methods
+### Compute Infrastructure
+Trained with 4x RTX 4090 24gb
+#### Hardware
+4x RTX 4090, 512GB Vram, AMD Epyc
+#### Software
+Python and Pytorch
+## Model Card Contact
+local.axiom.ai@protonmail.com or local.axiom.ai@gmail.com
+### Framework versions
+- PEFT 0.18.0