Instructions to use aifeifei798/roleplayer-actor-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aifeifei798/roleplayer-actor-lora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("./gemma-3-4b-it-qat-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "aifeifei798/roleplayer-actor-lora")

Transformers

How to use aifeifei798/roleplayer-actor-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aifeifei798/roleplayer-actor-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("aifeifei798/roleplayer-actor-lora", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use aifeifei798/roleplayer-actor-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aifeifei798/roleplayer-actor-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aifeifei798/roleplayer-actor-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/aifeifei798/roleplayer-actor-lora

SGLang

How to use aifeifei798/roleplayer-actor-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aifeifei798/roleplayer-actor-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aifeifei798/roleplayer-actor-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aifeifei798/roleplayer-actor-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aifeifei798/roleplayer-actor-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use aifeifei798/roleplayer-actor-lora with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aifeifei798/roleplayer-actor-lora to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aifeifei798/roleplayer-actor-lora to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for aifeifei798/roleplayer-actor-lora to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="aifeifei798/roleplayer-actor-lora",
    max_seq_length=2048,
)

Docker Model Runner
How to use aifeifei798/roleplayer-actor-lora with Docker Model Runner:
```
docker model run hf.co/aifeifei798/roleplayer-actor-lora
```

roleplayer-actor-lora by aifeifei798

An expert-level, hyper-specialized LoRA for high-fidelity Chinese role-playing. This adapter transforms the base model into a professional "digital actor," capable of adopting complex personas with remarkable consistency and detail.

Model Description

This is not just another chatbot LoRA. This model was fine-tuned with the specific goal of achieving extreme fidelity to character instructions. It excels at:

Deep Persona Adoption: Faithfully adheres to complex character backstories, personality traits, and linguistic styles provided in the instruction prompt.
Natural Dialogue Flow: Generates responses that are coherent, in-character, and contextually appropriate.
Descriptive Action Generation: A key feature of this model is its ability to spontaneously generate descriptive actions within brackets 【...】, such as 【gazes calmly at the user】, which significantly enhances the immersive role-playing experience. This skill has been observed to generalize to new, unseen characters.
Extreme Specialization: The model was trained to a very low loss value (~0.25), indicating a state of "perfect convergence" or "artistic overfit" on the high-quality role-playing dataset. This makes it an incredibly "pure" and stable actor.

This LoRA is ideal for applications requiring deep character immersion, such as interactive storytelling, advanced NPC development for games, or creating highly personalized chatbot companions.

How to Use

This model is a LoRA adapter and must be loaded on top of its base model. The use of unsloth is highly recommended for maximum performance and efficiency.

First, install the necessary libraries:

pip install "unsloth[colab-new]"
pip install "transformers>=4.38.0"
pip install "torch>=2.1.0"

Then, you can use the following Python script to load the model and run inference:

import torch
from unsloth import FastLanguageModel
from transformers import pipeline

# 1. Load the base model
# This model uses the 4-bit quantized version of Gemma-3-4B-it
base_model_path = "unsloth/gemma-3-4b-it-qat-unsloth-bnb-4bit"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=base_model_path,
    load_in_4bit=True,
    device_map="auto",
)

# 2. Load the LoRA adapter from Hugging Face Hub
print("Loading the 'Professional Actor' LoRA adapter from aifeifei798...")
model.load_adapter("aifeifei798/roleplayer-actor-lora")
print("Adapter loaded successfully!")

# 3. Prepare the prompt using the Alpaca format
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

# --- Example Character: "Lao Pao'er" (The Old Timer) ---
instruction = """You are a retired veteran named "Lao Pao'er," known for being hot-tempered and straight-talking, but with a heart of gold.
Your catchphrase is "Hey, I tell you, kid...".
Your language style is full of authentic Beijing dialect and is concise and powerful."""
user_input = "Hey grandpa, could you tell me where I can find a place to eat around here?"

# --- (Alternate Example: "Virene" the Bounty Hunter) ---
# instruction = """Character Name: Virene
# Background: A mysterious bounty hunter... (full description)"""
# user_input = "I need a bounty hunter for a mission. I've heard you are the best. Can we discuss a partnership?"

prompt = alpaca_prompt.format(
    instruction,
    user_input,
    "",  # The Response is left empty for the model to generate
)

# 4. Run inference
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
generation_args = {
    "max_new_tokens": 256,
    "do_sample": True,
    "temperature": 0.7,
    "top_p": 0.9,
    "top_k": 50,
    "pad_token_id": tokenizer.eos_token_id
}

outputs = pipe(prompt, **generation_args)
# The output includes the full prompt, so we split it to get only the response
response = outputs['generated_text'].split("### Response:").strip()

print("\n--- Model's Performance ---")
print(response)
# Expected Output (example):
# You kid don't know about Old Li's Noodle House nearby? Lots of people, great taste! 【points towards the intersection】

Training Details

Base Model: unsloth/gemma-3-4b-it-qat-unsloth-bnb-4bit
Dataset: shibing624/roleplay-zh-sharegpt-gpt4-data
Training Framework: unsloth
Hardware: 1x NVIDIA RTX 3070 (8GB VRAM)
Key Hyperparameters:
- per_device_train_batch_size: 1
- gradient_accumulation_steps: 8
- max_seq_length: 2048
- learning_rate: 2e-4
Training Insight: The training was manually stopped at epoch 0.45 (step 3410 of 7638) because the training loss had already converged to an exceptionally low value of ~0.25, indicating that the model had reached a state of maximum fidelity with the dataset.

Limitations and Bias

Specialist, Not a Generalist: This model is a hyper-specialized actor. Its capabilities in other domains (e.g., coding, factual Q&A, scientific reasoning) may be significantly degraded compared to the base model. It prioritizes staying in character over providing factual accuracy.
Data Bias: The model will reflect the biases and stereotypes present in the roleplay-zh-sharegpt-gpt4-data dataset.
Language: The model is primarily trained on Chinese data and will perform best in Mandarin Chinese.

Author

This model was trained by aifeifei798. A journey of deep learning, intense debugging, and creative exploration led to the birth of this "Professional Actor." All credit for this excellent LoRA goes to them.

Downloads last month: -

Model tree for aifeifei798/roleplayer-actor-lora

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Finetuned

google/gemma-3-4b-it-qat-q4_0-unquantized

Quantized

unsloth/gemma-3-4b-it-qat-unsloth-bnb-4bit

Adapter

(1)

this model