Instructions to use efficiencyx/Jun-Lora-v2-SAFETENSOR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use efficiencyx/Jun-Lora-v2-SAFETENSOR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="efficiencyx/Jun-Lora-v2-SAFETENSOR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("efficiencyx/Jun-Lora-v2-SAFETENSOR") model = AutoModelForMultimodalLM.from_pretrained("efficiencyx/Jun-Lora-v2-SAFETENSOR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use efficiencyx/Jun-Lora-v2-SAFETENSOR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "efficiencyx/Jun-Lora-v2-SAFETENSOR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "efficiencyx/Jun-Lora-v2-SAFETENSOR", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/efficiencyx/Jun-Lora-v2-SAFETENSOR
- SGLang
How to use efficiencyx/Jun-Lora-v2-SAFETENSOR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "efficiencyx/Jun-Lora-v2-SAFETENSOR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "efficiencyx/Jun-Lora-v2-SAFETENSOR", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "efficiencyx/Jun-Lora-v2-SAFETENSOR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "efficiencyx/Jun-Lora-v2-SAFETENSOR", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use efficiencyx/Jun-Lora-v2-SAFETENSOR with Docker Model Runner:
docker model run hf.co/efficiencyx/Jun-Lora-v2-SAFETENSOR
Jun-Lora-v2 — SafeTensors (FP16, Merged)
A LoRA fine-tune of Gemma 4 12B trained on synthetic multi-turn conversational data from the visual novel My Dystopian Robot Girlfriend. The model captures the personality, speech patterns, and emotional nuance of the character Jun while preserving the base model's general reasoning and instruction-following capabilities.
This repository contains the full-precision merged model in SafeTensors FP16 format — the highest-quality variant, recommended for production deployments, further fine-tuning, or as a merge base.
Model Variants & Repositories
| Repository | Format | Description |
|---|---|---|
efficiencyx/Jun-Lora-v2-SAFETENSOR |
SafeTensors FP16 | This repo — Full-precision merged model |
efficiencyx/Jun-Lora-v2-GGUF |
GGUF Q8_0 / Q6_K / Q4_K_M | Quantized versions for local inference |
efficiencyx/Jun-Lora-v2 |
LoRA Adapter | Raw adapters at checkpoints 138, 120, 90 |
When to Use This Variant
| Use Case | Recommendation |
|---|---|
| Production server deployment (≥24 GB VRAM) | This repo (FP16) |
| Further fine-tuning or merging | This repo (FP16) |
| Local inference on consumer GPUs | Use Jun-Lora-v2-GGUF |
| Experimenting with adapter checkpoints | Use Jun-Lora-v2 |
VRAM requirement: approximately 24 GB for FP16 inference. For lower-VRAM setups, use the GGUF variant.
Intended Use
This model is designed as the conversational backend for Jun OS, an AI companion webapp. It is intended for:
- Character-consistent multi-turn conversation in ChatML format
- AI companion / interactive fiction applications
- Research into character-faithful fine-tuning on small, high-quality datasets
- Base for further quantization, merging, or continued fine-tuning
Limitations
- The model is specialized for a single character persona; it is not a general-purpose assistant.
- Outputs may reflect fictional narrative tropes and should not be treated as factual information or advice.
- Performance degrades on tasks far outside the training distribution (e.g. code generation, structured data extraction).
- The model inherits any biases present in the Gemma 4 12B base weights.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "efficiencyx/Jun-Lora-v2-SAFETENSOR"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are Jun, an AI companion..."},
{"role": "user", "content": "Hey Jun, how are you feeling today?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
output = model.generate(input_ids, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
The model uses ChatML format (<|im_start|> / <|im_end|>) consistent with the training data.
Training Details
Dataset
| Property | Value |
|---|---|
| Source | My Dystopian Robot Girlfriend (visual novel dialogue) |
| Composition | ~1:1 replica of original game tone and cadence |
| Size | 2,302 multi-turn conversations |
| Format | ChatML (`< |
The dataset was constructed to preserve the character's tone, vocabulary, emotional range, and conversational patterns across a variety of in-game scenarios. Multi-turn structure ensures the model learns contextual consistency over extended exchanges.
Hyperparameters
| Parameter | Value |
|---|---|
| Base model | google/gemma-4-12b-it |
| Method | LoRA |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| Learning rate | 2e-5 |
| Batch size | 8 |
| Gradient accumulation steps | 4 |
| Effective batch size | 32 |
| Epochs | 2 |
| Total steps | 138 |
| Checkpoint interval | Every 30 steps |
| Optimizer | AdamW (8-bit) |
Infrastructure
| Component | Detail |
|---|---|
| Training GPU | NVIDIA A100 80GB SXM4 |
| Fine-tuning framework | Unsloth |
| Merge & export | Unsloth (merge_and_unload) → SafeTensors FP16 |
Evaluation
Quantitative
| Metric | Value |
|---|---|
| Final training loss | ~1.21 |
| Final eval loss | ~1.24 |
The narrow gap between training and eval loss indicates the model generalizes well without significant overfitting, despite the relatively small dataset size.
Qualitative
- Character consistency: The model maintains Jun's personality, speech patterns, and emotional responses across varied conversational contexts.
- Reasoning preservation: General reasoning capabilities from the Gemma 4 12B base remain intact; the model can engage in logical discussion while staying in character.
- Generalization: The model handles novel conversational scenarios not present in the training set while preserving character-faithful responses.
Checkpoint Selection
If you prefer to apply a specific adapter checkpoint rather than using this merged model, raw adapters are available in efficiencyx/Jun-Lora-v2 at steps 90, 120, and 138. Earlier checkpoints may exhibit slightly more creative freedom; the final checkpoint (138) — used for this merge — has the strongest character lock-in.
Acknowledgments
- Incontinent Cell for My Dystopian Robot Girlfriend, Jun's character
- Google for the Gemma 4 model family
- Google Colaboratory for allowing easy and cheap access to powerful GPU
- Unsloth for the efficient fine-tuning framework
- Downloads last month
- 22