Instructions to use efficiencyx/Jun-Lora-v2-SAFETENSOR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use efficiencyx/Jun-Lora-v2-SAFETENSOR with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="efficiencyx/Jun-Lora-v2-SAFETENSOR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("efficiencyx/Jun-Lora-v2-SAFETENSOR")
model = AutoModelForMultimodalLM.from_pretrained("efficiencyx/Jun-Lora-v2-SAFETENSOR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use efficiencyx/Jun-Lora-v2-SAFETENSOR with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "efficiencyx/Jun-Lora-v2-SAFETENSOR"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "efficiencyx/Jun-Lora-v2-SAFETENSOR",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/efficiencyx/Jun-Lora-v2-SAFETENSOR

SGLang

How to use efficiencyx/Jun-Lora-v2-SAFETENSOR with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "efficiencyx/Jun-Lora-v2-SAFETENSOR" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "efficiencyx/Jun-Lora-v2-SAFETENSOR",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "efficiencyx/Jun-Lora-v2-SAFETENSOR" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "efficiencyx/Jun-Lora-v2-SAFETENSOR",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use efficiencyx/Jun-Lora-v2-SAFETENSOR with Docker Model Runner:
```
docker model run hf.co/efficiencyx/Jun-Lora-v2-SAFETENSOR
```

Jun-Lora-v2 — SafeTensors (FP16, Merged)

A LoRA fine-tune of Gemma 4 12B trained on synthetic multi-turn conversational data from the visual novel My Dystopian Robot Girlfriend. The model captures the personality, speech patterns, and emotional nuance of the character Jun while preserving the base model's general reasoning and instruction-following capabilities.

This repository contains the full-precision merged model in SafeTensors FP16 format — the highest-quality variant, recommended for production deployments, further fine-tuning, or as a merge base.

Model Variants & Repositories

Repository	Format	Description
`efficiencyx/Jun-Lora-v2-SAFETENSOR`	SafeTensors FP16	This repo — Full-precision merged model
`efficiencyx/Jun-Lora-v2-GGUF`	GGUF Q8_0 / Q6_K / Q4_K_M	Quantized versions for local inference
`efficiencyx/Jun-Lora-v2`	LoRA Adapter	Raw adapters at checkpoints 138, 120, 90

When to Use This Variant

Use Case	Recommendation
Production server deployment (≥24 GB VRAM)	This repo (FP16)
Further fine-tuning or merging	This repo (FP16)
Local inference on consumer GPUs	Use `Jun-Lora-v2-GGUF`
Experimenting with adapter checkpoints	Use `Jun-Lora-v2`

VRAM requirement: approximately 24 GB for FP16 inference. For lower-VRAM setups, use the GGUF variant.

Intended Use

This model is designed as the conversational backend for Jun OS, an AI companion webapp. It is intended for:

Character-consistent multi-turn conversation in ChatML format
AI companion / interactive fiction applications
Research into character-faithful fine-tuning on small, high-quality datasets
Base for further quantization, merging, or continued fine-tuning

Limitations

The model is specialized for a single character persona; it is not a general-purpose assistant.
Outputs may reflect fictional narrative tropes and should not be treated as factual information or advice.
Performance degrades on tasks far outside the training distribution (e.g. code generation, structured data extraction).
The model inherits any biases present in the Gemma 4 12B base weights.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
 
model_id = "efficiencyx/Jun-Lora-v2-SAFETENSOR"
 
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
 
messages = [
    {"role": "system", "content": "You are Jun, an AI companion..."},
    {"role": "user", "content": "Hey Jun, how are you feeling today?"},
]
 
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)
 
output = model.generate(input_ids, max_new_tokens=256, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))

The model uses ChatML format (<|im_start|> / <|im_end|>) consistent with the training data.

Training Details

Dataset

Property	Value
Source	My Dystopian Robot Girlfriend (visual novel dialogue)
Composition	~1:1 replica of original game tone and cadence
Size	2,302 multi-turn conversations
Format	ChatML (`<

The dataset was constructed to preserve the character's tone, vocabulary, emotional range, and conversational patterns across a variety of in-game scenarios. Multi-turn structure ensures the model learns contextual consistency over extended exchanges.

Hyperparameters

Parameter	Value
Base model	`google/gemma-4-12b-it`
Method	LoRA
LoRA rank	64
LoRA alpha	128
Learning rate	2e-5
Batch size	8
Gradient accumulation steps	4
Effective batch size	32
Epochs	2
Total steps	138
Checkpoint interval	Every 30 steps
Optimizer	AdamW (8-bit)

Infrastructure

Component	Detail
Training GPU	NVIDIA A100 80GB SXM4
Fine-tuning framework	Unsloth
Merge & export	Unsloth (`merge_and_unload`) → SafeTensors FP16

Evaluation

Quantitative

Metric	Value
Final training loss	~1.21
Final eval loss	~1.24

The narrow gap between training and eval loss indicates the model generalizes well without significant overfitting, despite the relatively small dataset size.

Qualitative

Character consistency: The model maintains Jun's personality, speech patterns, and emotional responses across varied conversational contexts.
Reasoning preservation: General reasoning capabilities from the Gemma 4 12B base remain intact; the model can engage in logical discussion while staying in character.
Generalization: The model handles novel conversational scenarios not present in the training set while preserving character-faithful responses.

Checkpoint Selection

If you prefer to apply a specific adapter checkpoint rather than using this merged model, raw adapters are available in efficiencyx/Jun-Lora-v2 at steps 90, 120, and 138. Earlier checkpoints may exhibit slightly more creative freedom; the final checkpoint (138) — used for this merge — has the strongest character lock-in.

Acknowledgments

Incontinent Cell for My Dystopian Robot Girlfriend, Jun's character
Google for the Gemma 4 model family
Google Colaboratory for allowing easy and cheap access to powerful GPU
Unsloth for the efficient fine-tuning framework

Downloads last month: 22

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for efficiencyx/Jun-Lora-v2-SAFETENSOR

Adapters

1 model