Salamandra 7B Instruct - Resident Evil Edition 🧟‍♂️🌿

This model is a Fine-Tuned version of the BSC-LT/salamandra-7b-instruct foundational model, specifically trained to master the lore, characters, and events of the Resident Evil universe.

The training process was conducted using QLoRA (Quantized Low-Rank Adaptation) techniques to achieve high performance. The resulting adapter has already been merged with the base model, making it ready for plug-and-play use.

📦 Model Details

Base Model: BSC-LT/salamandra-7b-instruct
Training Dataset: DavidCaraballoBulnes/ResidentEvil-Data-Instruct
Source Code: salamandra-7b-resident-evil-sft (GitHub)
Architecture: Causal Language Modeling (Causal LM)
Final Precision: bfloat16
Primary Language: Spanish (ES)

⚙️ Training Process (Fine-Tuning)

The model was trained using the Hugging Face ecosystem (transformers, peft, trl) with severe optimizations tailored for mixed hardware environments (e.g., RTX 4070 Ti 12GB VRAM + 32GB System RAM).

Applied Optimizations:

4-bit Quantization (NF4): via BitsAndBytes with double_quant=True and computations in bfloat16.
Smart Memory Offloading: Strict memory management to maximize VRAM usage while offloading optimizer states to system RAM using the paged_adamw_8bit optimizer.
Gradient Checkpointing: Enabled to significantly reduce the memory footprint of activations.
Optimized Context Length: Maximum context length set to 512 tokens without packing, which is ideal for direct QA pairs and instruction-following tasks.

QLoRA Hyperparameters:

LoRA Rank (r): 16
LoRA Alpha: 32
Dropout: 0.05
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Hyperparameters (SFT):

Epochs: 3
Batch Size per Device: 1
Gradient Accumulation Steps: 16 (Effective Batch Size of 16)
Learning Rate: 2e-4
Scheduler: Cosine (with 50 warmup steps)
Weight Decay: 0.01
Max Gradient Norm: 0.3

🧩 Model Merging

To prevent video memory fragmentation and Out-Of-Memory (OOM) errors, the trained LoRA adapter was merged with the original base model exclusively on the system RAM (CPU) (device_map="cpu").

The output provided in this repository is the unified model (merge_and_unload()), meaning you do not need to load the adapter independently. It is ready for inference.

🚀 How to Use This Model

You can easily test the model using the transformers library. Since it is already merged, it loads like any standard causal language model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model_id = "DavidCaraballoBulnes/ResidentEvil-QA"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto",
    trust_remote_code=True
)

# 1. System Prompt: Strict guardrails to prevent AI hallucinations and enforce canon
system_prompt = (
    "Eres un archivero experto en la historia, los personajes y los virus "
    "del universo oficial de los videojuegos de Resident Evil (creado por Capcom). "
    "Tu misión es dar respuestas precisas, directas y basadas estrictamente en el canon. "
    "Reglas críticas: No inventes nombres de criaturas, no mezcles novelas con los juegos, "
    "y bajo ninguna circunstancia alucines información. Si no conoces la respuesta exacta, "
    "debes responder: 'No tengo información verificada sobre esto en los archivos de Umbrella'."
)

# 2. Prepare the messages using the chat template
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "¿Quién es Oswell E. Spencer?"}
]

prompt = tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

# Use model.device to ensure hardware agnosticism (works on CUDA, CPU, or MPS/Mac)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# 3. Generate the response (Low temperature enforces factual accuracy)
outputs = model.generate(
    **inputs, 
    max_new_tokens=1024, 
    temperature=0.2, 
    top_p=0.9,
    do_sample=True
)

# 4. Decode only the newly generated tokens
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Disclaimer: This model is not affiliated with, endorsed by, or approved by Capcom. All content related to Resident Evil is used solely for professional and research purposes. Copyrights and trademarks belong to their respective owners.

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for DavidCaraballoBulnes/ResidentEvil-QA

Base model

BSC-LT/salamandra-7b

Finetuned

BSC-LT/salamandra-7b-instruct

Finetuned

(13)

this model

DavidCaraballoBulnes
/

ResidentEvil-QA