Text Generation
PEFT
Safetensors
Transformers
English
Spanish
qwen2
lora
conversational
text-generation-inference
Instructions to use Local-Axiom-AI/LinguaTale-EN-ES with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Local-Axiom-AI/LinguaTale-EN-ES with PEFT:
Task type is invalid.
- Transformers
How to use Local-Axiom-AI/LinguaTale-EN-ES with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Local-Axiom-AI/LinguaTale-EN-ES") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES") model = AutoModelForCausalLM.from_pretrained("Local-Axiom-AI/LinguaTale-EN-ES") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Local-Axiom-AI/LinguaTale-EN-ES with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Local-Axiom-AI/LinguaTale-EN-ES" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Local-Axiom-AI/LinguaTale-EN-ES", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES
- SGLang
How to use Local-Axiom-AI/LinguaTale-EN-ES with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Local-Axiom-AI/LinguaTale-EN-ES", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Local-Axiom-AI/LinguaTale-EN-ES" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Local-Axiom-AI/LinguaTale-EN-ES", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Local-Axiom-AI/LinguaTale-EN-ES with Docker Model Runner:
docker model run hf.co/Local-Axiom-AI/LinguaTale-EN-ES
| base_model: Qwen/Qwen2.5-0.5B | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| tags: | |
| - base_model:adapter:Qwen/Qwen2.5-0.5B | |
| - lora | |
| - transformers | |
| license: mit | |
| language: | |
| - en | |
| - es | |
| # Model Card for LinguaTale-EN-ES | |
| This is a finetuned model based on the architecture of Qwen2.5-0.5B that is designed for english to spanish translations | |
| ### Model Description | |
| This model was finetuned using LoRA on ~100M EN to ES translations or about ~4B tokens | |
| - **Developed by:** Local-Axiom-AI | |
| - **Model type:** Translation | |
| - **Language(s) (NLP):** English and Spanish | |
| - **License:** MIT | |
| - **Finetuned from model:** Qwen2.5-0.5B | |
| ## Uses | |
| It is designed for situations that require a lightweight translation of small paragraphs from English to Spanish or Spanish to English that has to happen in a private way or way that does not require internet | |
| ### Out-of-Scope Use | |
| Does very poorly with non English to spanish or Spanish to English translation or with very long translations | |
| ## Bias, Risks, and Limitations | |
| It does not work well when involving names | |
| ### Recommendations | |
| Translations of a few sentences or a single paragraph that are less than 512 tokens in length, because to reduce training time it was only trained with a max context of 512 tokens | |
| ## How to Get Started with the Model | |
| ``` | |
| #!/usr/bin/env python3 | |
| # -*- coding: utf-8 -*- | |
| import argparse | |
| import logging | |
| import os | |
| import sys | |
| import torch | |
| from flask import Flask, jsonify, request | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig | |
| logging.basicConfig(level=logging.INFO) | |
| log = logging.getLogger(__name__) | |
| app = Flask(__name__) | |
| MODEL = None | |
| TOKENIZER = None | |
| DEVICE = None | |
| STOP_ID = None | |
| def load_model(model_dir: str, base_model_id: str, quantize: bool = False): | |
| global MODEL, TOKENIZER, DEVICE, STOP_ID | |
| DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| log.info(f"Using device: {DEVICE}") | |
| if quantize: | |
| qcfg = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_quant_type="nf4", | |
| bnb_4bit_use_double_quant=True, | |
| bnb_4bit_compute_dtype=torch.bfloat16, | |
| ) | |
| MODEL = AutoModelForCausalLM.from_pretrained( | |
| model_dir, | |
| quantization_config=qcfg, | |
| torch_dtype=torch.bfloat16, | |
| trust_remote_code=True, | |
| ) | |
| else: | |
| MODEL = AutoModelForCausalLM.from_pretrained( | |
| model_dir, | |
| torch_dtype=torch.bfloat16, | |
| trust_remote_code=True, | |
| ) | |
| MODEL.eval().to(DEVICE) | |
| TOKENIZER = AutoTokenizer.from_pretrained( | |
| base_model_id, | |
| trust_remote_code=True, | |
| use_fast=False, | |
| ) | |
| TOKENIZER.pad_token = TOKENIZER.eos_token | |
| if "<STOP>" not in TOKENIZER.get_vocab(): | |
| log.info("Adding <STOP> token to tokenizer") | |
| TOKENIZER.add_special_tokens( | |
| {"additional_special_tokens": ["<STOP>"]} | |
| ) | |
| MODEL.resize_token_embeddings(len(TOKENIZER)) | |
| STOP_ID = TOKENIZER.convert_tokens_to_ids("<STOP>") | |
| log.info(f"<STOP> token id: {STOP_ID}") | |
| log.info("Model & tokenizer loaded successfully") | |
| def build_prompt(text: str, source: str, target: str) -> str: | |
| if source == "en" and target == "es": | |
| return f"Translate the following English text to Spanish:\n{text}\n\nTranslation:" | |
| elif source == "es" and target == "en": | |
| return f"Translate the following Spanish text to English:\n{text}\n\nTranslation:" | |
| else: | |
| raise ValueError("Unsupported translation direction") | |
| @torch.inference_mode() | |
| def translate(text: str, source: str, target: str) -> str: | |
| prompt = build_prompt(text, source, target) | |
| inputs = TOKENIZER(prompt, return_tensors="pt").to(DEVICE) | |
| prompt_len = inputs["input_ids"].shape[1] | |
| src_tokens = len(TOKENIZER.tokenize(text)) | |
| max_new = int(src_tokens * 1.3) + 6 | |
| output = MODEL.generate( | |
| **inputs, | |
| max_new_tokens=max_new, | |
| do_sample=False, | |
| temperature=0.0, | |
| eos_token_id=STOP_ID, | |
| pad_token_id=TOKENIZER.eos_token_id, | |
| repetition_penalty=1.05, | |
| ) | |
| decoded = TOKENIZER.decode( | |
| output[0][prompt_len:], skip_special_tokens=False | |
| ) | |
| return decoded.split("<STOP>")[0].strip() | |
| @app.route("/translate", methods=["POST"]) | |
| def translate_endpoint(): | |
| data = request.get_json(silent=True) | |
| if not data: | |
| return jsonify({"error": "Invalid JSON"}), 400 | |
| text = data.get("text") | |
| source = data.get("source") | |
| target = data.get("target") | |
| if not all([text, source, target]): | |
| return jsonify({"error": "Missing fields"}), 400 | |
| if MODEL is None: | |
| try: | |
| load_model( | |
| args.model_dir, | |
| args.base_model_id, | |
| args.quantize, | |
| ) | |
| except Exception as e: | |
| log.exception("Model load failed") | |
| return jsonify({"error": str(e)}), 500 | |
| try: | |
| result = translate(text, source, target) | |
| return jsonify({"translation": result}) | |
| except Exception as e: | |
| log.exception("Inference failed") | |
| return jsonify({"error": str(e)}), 500 | |
| if __name__ == "__main__": | |
| parser = argparse.ArgumentParser() | |
| parser.add_argument("--model_dir", required=True) | |
| parser.add_argument("--base_model_id", default="Qwen/Qwen2.5-0.5B") | |
| parser.add_argument("--quantize", action="store_true") | |
| parser.add_argument("--port", type=int, default=8011) | |
| args = parser.parse_args() | |
| if not os.path.isdir(args.model_dir): | |
| log.error("Invalid model directory") | |
| sys.exit(1) | |
| log.info(f"Starting Translation API on port {args.port}") | |
| app.run(host="0.0.0.0", port=args.port, threaded=True) | |
| ``` | |
| ### Training Data | |
| Here is an example from the taining data: For those who like contrasts, Para quien le gusten los contrastes | |
| ### Training Procedure | |
| Normal LoRA finetuning | |
| #### Training Hyperparameters | |
| - **Training regime:** Trained in FP16 with a R=8 and L_A=32 | |
| #### Speeds, Sizes, Times | |
| Trained with a 4x RTX 4090s in about 80 hours | |
| ## Evaluation | |
| This model got a loss of 0.0476 on testing data | |
| #### Testing Data | |
| 15% of the training data was split off before training and used for testing | |
| #### Metrics | |
| It was tested with some basic and more challanging translations | |
| ### Results | |
| Quite good for a 0.5B model | |
| #### Summary | |
| A good AI for translation involving English and Spanish with minimal Vram usage | |
| ## Environmental Impact | |
| - **Hardware Type:** 4x RTX 4090 | |
| - **Hours used:** 80 | |
| - **Compute Region:** USA | |
| - **Carbon Emitted:** 77.36 Lbs | |
| ### Model Objective | |
| Its objective is to give more precise translations than other translation methods | |
| ### Compute Infrastructure | |
| Trained with 4x RTX 4090 24gb | |
| #### Hardware | |
| 4x RTX 4090, 512GB Vram, AMD Epyc | |
| #### Software | |
| Python and Pytorch | |
| ## Model Card Contact | |
| local.axiom.ai@protonmail.com or local.axiom.ai@gmail.com | |
| ### Framework versions | |
| - PEFT 0.18.0 |