Instructions to use ReXeeD/Luminus-1.5B-Roleplay with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ReXeeD/Luminus-1.5B-Roleplay with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ReXeeD/Luminus-1.5B-Roleplay")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ReXeeD/Luminus-1.5B-Roleplay")
model = AutoModelForCausalLM.from_pretrained("ReXeeD/Luminus-1.5B-Roleplay")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ReXeeD/Luminus-1.5B-Roleplay with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ReXeeD/Luminus-1.5B-Roleplay"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ReXeeD/Luminus-1.5B-Roleplay",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ReXeeD/Luminus-1.5B-Roleplay

SGLang

How to use ReXeeD/Luminus-1.5B-Roleplay with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ReXeeD/Luminus-1.5B-Roleplay" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ReXeeD/Luminus-1.5B-Roleplay",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ReXeeD/Luminus-1.5B-Roleplay" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ReXeeD/Luminus-1.5B-Roleplay",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio

How to use ReXeeD/Luminus-1.5B-Roleplay with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ReXeeD/Luminus-1.5B-Roleplay to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ReXeeD/Luminus-1.5B-Roleplay to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ReXeeD/Luminus-1.5B-Roleplay to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="ReXeeD/Luminus-1.5B-Roleplay",
    max_seq_length=2048,
)

Docker Model Runner
How to use ReXeeD/Luminus-1.5B-Roleplay with Docker Model Runner:
```
docker model run hf.co/ReXeeD/Luminus-1.5B-Roleplay
```

ReXeeD commited on Apr 15

Commit

eb816ce

verified ·

1 Parent(s): 5c8ff4b

Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.gitattributes +1 -0
README.md +94 -0
config.json +63 -0
model.safetensors +3 -0
tokenizer.json +3 -0
tokenizer_config.json +16 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,94 @@

+---
+language:
+- en
+license: apache-2.0
+pipeline_tag: text-generation
+tags:
+- roleplay
+- chat
+- unsloth
+- dpo
+- qwen
+- llama-cpp
+library_name: transformers
+base_model: Qwen/Qwen2.5-1.5B
+---
+# Luminus-1.5B-128K: Advanced Small-Parameter Roleplay Model
+Luminus-1.5B-128K is a highly optimized 1.5B parameter model designed to deliver the immersive roleplay quality, character consistency, and long-context understanding typically found in larger 3B–4B models.
+By layering advanced research-backed techniques like **Chain-of-Thought (CoT) Distillation**, **Instruction-Following Difficulty (IFD) Filtering**, and **Direct Preference Optimization (DPO)** over a custom roleplay dataset, this model closes the reasoning gap for immersive storytelling, making it perfectly suited to run on modest local hardware.
+## Core Innovations
+- **CoT Reasoning Traces:** Trained on data formatted with `<think>` blocks, teaching the model *why* a character responds a certain way before it outputs the final dialogue.
+- **DPO Preference Alignment:** Aligned using carefully curated Chosen/Rejected pairs to explicitly prefer deep, immersive, sensory-rich responses over bland AI-assistant-like text.
+- **Expanded Context Size:** Utilizes YaRN RoPE scaling to push the standard context limit up to **128K tokens**, enabling very long roleplaying sessions without losing character consistency.
+- **Top-Tier Data Quality:** Used IFD (Instruction-Following Difficulty) scoring to algorithmically filter out the weakest 30% of the training data, ensuring the model only learns from the most challenging, high-quality exchanges.
+## Training Details
+- **Base Model:** [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)
+- **Context Length:** 128,000 tokens inference (trained at 8k with YaRN RoPE factor 16.0)
+- **Hardware:** Trained on Kaggle using T4 GPUs.
+- **Training Pipeline Stages:**
+    1. **Data Generation & Filtering (Stage 1 & 1A):** Initial data generated using a large teacher model (Qwen 3.5 32B / gpt-oss-120b ) acting as a Narrative Architect. Traces include both internal psychology `<think>` blocks and external roleplay responses. The dataset was then rigidly filtered using IFD logic to keep only the top 70% highest quality data.
+    2. **Supervised Fine-Tuning (Stage 2):** Unsloth SFT layered over the base model. Trained at an 8192 sequence length with an effective batch size of 16 on a carefully balanced mix (approx. 60% standard RP, 40% CoT examples).
+    This step used large response data  and thiking data so model was able undertand how to roleplay
+    3. **DPO Alignment (Stage 3A & 3B):** Direct Preference Optimization applying short, immersive RP data against uninspired/bland output to instill stylistic preference. Parameters tuned for maximum stability (`beta=0.1`, `lr=5e-5`, over 1 epoch).
+    4. **Supervised Fine-Tuning (Stage 3C)** Unsloth SFT second layer on stage 3B cleared model to respond based on the situtaion and user message....the model was able to respond in long and short acccording to the message from user.
+## Installation and Usage and Tips
+The model is exported in `.safetensors` format. You can natively load it using Transformers or Unsloth, or convert it to GGUF if you wish to run it in standard frontends like LM Studio or Kobold.
+### Recommended System Prompt
+This model is heavily trained to think before speaking. Using the following system prompt yields the best results and ensures the model accurately formats its `<think>` blocks before responding:
+```text
+You are a realistic, character-driven roleplay engine. You are roleplaying as {{char}}. Write strictly in third-person limited perspective.
+CORE RULES:
+- BOUNDARIES: NEVER speak, think, or generate actions for {{user}}.
+- HISTORY & CONTEXT: Your reactions must logically follow past messages. Stay strictly in the present moment.
+- PACING & DIALOGUE: Keep it slow-burn and grounded. Keep dialogue concise.
+- FORMATTING: You must strictly follow the thought process format below, followed by a short roleplay response, and then STOP IMMEDIATELY. Output the <|im_end|> token.
+Format your response EXACTLY like this:
+<think>
+1. INTENT: [User's intent in 1 sentence]
+2. STATE: [Character's emotional state in 1 sentence]
+3. PLAN: I will write 1 to 2 action sentences and 1 dialogue sentence, then STOP if user message is small else if he is asking something detailed reply in more detail.
+</think>
+*Grounded action and environmental description.*
+"Natural dialogue."
+```
+### Optimal Inference Settings
+For the best text generation, it is strongly recommended to use a mild repetition penalty and a stopping criteria that intercepts `<|im_end|>`.
+```python
+generation_kwargs = dict(
+    # ... your inputs ...
+    max_new_tokens=350,       # Leash to prevent runaway generations
+    temperature=0.65,         # Keep it grounded
+    repetition_penalty=1.1,   # Punish loops like "A pause. A pause."
+    top_p=0.9,
+    do_sample=True,
+    pad_token_id=tokenizer.eos_token_id,
+    stopping_criteria=stopping_criteria  # CRITICAL: This kills the thread on <|im_end|>
+)
+```
+## Risks and Limitations
+- **Complex Multi-Character Plots:** While the model heavily punches above its weight class (1.5B), exceptionally nuanced multi-character tracking or massive world-building tasks might still strain its parameter limits compared to an 8B+ model.
+- **Inherited Base Model Biases:** Output behavior is still tethered to the foundational weights of Qwen2.5-1.5B.
+- **Thinking Tags Extraction:** Ensure your frontend properly hides `<think>` ... `</think>` tags if you only wish to see the character's final verbal/action responses.
+## Responsible Usage
+This model is focused extensively on fictional roleplay and creative writing. It is NOT intended to provide factual advice, conduct real-world real-time analysis, or generate non-fictional transcripts. Please use responsibly, keeping within ethical and licensing obligations of the model's lineage.

config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+    "architectures": [
+        "Qwen2ForCausalLM"
+    ],
+    "attention_dropout": 0.0,
+    "bos_token_id": null,
+    "torch_dtype": "float16",
+    "eos_token_id": 151643,
+    "hidden_act": "silu",
+    "hidden_size": 1536,
+    "initializer_range": 0.02,
+    "intermediate_size": 8960,
+    "layer_types": [
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention",
+        "full_attention"
+    ],
+    "max_position_embeddings": 32768,
+    "max_window_layers": 28,
+    "model_type": "qwen2",
+    "num_attention_heads": 12,
+    "num_hidden_layers": 28,
+    "num_key_value_heads": 2,
+    "pad_token_id": 151665,
+    "rms_norm_eps": 1e-06,
+    "rope_parameters": {
+        "rope_theta": 1000000.0,
+        "rope_type": "default"
+    },
+    "sliding_window": null,
+    "tie_word_embeddings": true,
+    "unsloth_fixed": true,
+    "unsloth_version": "2026.4.4",
+    "use_cache": false,
+    "use_mrope": false,
+    "use_sliding_window": false,
+    "vocab_size": 151666
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:65228bb1f1bd1a82326aa7557b4758b025bced9abfdb86221d185093c7227c58
+size 3087467144

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd5948af71b4f56cf697f7580814c7ce8b80595ef985544efcacf716126a2e31
+size 11422356

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|endoftext|>",
+  "errors": "replace",
+  "extra_special_tokens": [],
+  "is_local": true,
+  "model_max_length": 32768,
+  "pad_token": "<|PAD_TOKEN|>",
+  "padding_side": "left",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}