ReXeeD commited on
Commit
eb816ce
·
verified ·
1 Parent(s): 5c8ff4b

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. .gitattributes +1 -0
  2. README.md +94 -0
  3. config.json +63 -0
  4. model.safetensors +3 -0
  5. tokenizer.json +3 -0
  6. tokenizer_config.json +16 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - roleplay
8
+ - chat
9
+ - unsloth
10
+ - dpo
11
+ - qwen
12
+ - llama-cpp
13
+
14
+ library_name: transformers
15
+ base_model: Qwen/Qwen2.5-1.5B
16
+ ---
17
+
18
+ # Luminus-1.5B-128K: Advanced Small-Parameter Roleplay Model
19
+
20
+ Luminus-1.5B-128K is a highly optimized 1.5B parameter model designed to deliver the immersive roleplay quality, character consistency, and long-context understanding typically found in larger 3B–4B models.
21
+
22
+ By layering advanced research-backed techniques like **Chain-of-Thought (CoT) Distillation**, **Instruction-Following Difficulty (IFD) Filtering**, and **Direct Preference Optimization (DPO)** over a custom roleplay dataset, this model closes the reasoning gap for immersive storytelling, making it perfectly suited to run on modest local hardware.
23
+
24
+ ## Core Innovations
25
+
26
+ - **CoT Reasoning Traces:** Trained on data formatted with `<think>` blocks, teaching the model *why* a character responds a certain way before it outputs the final dialogue.
27
+ - **DPO Preference Alignment:** Aligned using carefully curated Chosen/Rejected pairs to explicitly prefer deep, immersive, sensory-rich responses over bland AI-assistant-like text.
28
+ - **Expanded Context Size:** Utilizes YaRN RoPE scaling to push the standard context limit up to **128K tokens**, enabling very long roleplaying sessions without losing character consistency.
29
+ - **Top-Tier Data Quality:** Used IFD (Instruction-Following Difficulty) scoring to algorithmically filter out the weakest 30% of the training data, ensuring the model only learns from the most challenging, high-quality exchanges.
30
+
31
+ ## Training Details
32
+
33
+ - **Base Model:** [Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)
34
+ - **Context Length:** 128,000 tokens inference (trained at 8k with YaRN RoPE factor 16.0)
35
+ - **Hardware:** Trained on Kaggle using T4 GPUs.
36
+ - **Training Pipeline Stages:**
37
+ 1. **Data Generation & Filtering (Stage 1 & 1A):** Initial data generated using a large teacher model (Qwen 3.5 32B / gpt-oss-120b ) acting as a Narrative Architect. Traces include both internal psychology `<think>` blocks and external roleplay responses. The dataset was then rigidly filtered using IFD logic to keep only the top 70% highest quality data.
38
+ 2. **Supervised Fine-Tuning (Stage 2):** Unsloth SFT layered over the base model. Trained at an 8192 sequence length with an effective batch size of 16 on a carefully balanced mix (approx. 60% standard RP, 40% CoT examples).
39
+ This step used large response data and thiking data so model was able undertand how to roleplay
40
+ 3. **DPO Alignment (Stage 3A & 3B):** Direct Preference Optimization applying short, immersive RP data against uninspired/bland output to instill stylistic preference. Parameters tuned for maximum stability (`beta=0.1`, `lr=5e-5`, over 1 epoch).
41
+ 4. **Supervised Fine-Tuning (Stage 3C)** Unsloth SFT second layer on stage 3B cleared model to respond based on the situtaion and user message....the model was able to respond in long and short acccording to the message from user.
42
+
43
+
44
+ ## Installation and Usage and Tips
45
+
46
+ The model is exported in `.safetensors` format. You can natively load it using Transformers or Unsloth, or convert it to GGUF if you wish to run it in standard frontends like LM Studio or Kobold.
47
+
48
+ ### Recommended System Prompt
49
+ This model is heavily trained to think before speaking. Using the following system prompt yields the best results and ensures the model accurately formats its `<think>` blocks before responding:
50
+
51
+ ```text
52
+ You are a realistic, character-driven roleplay engine. You are roleplaying as {{char}}. Write strictly in third-person limited perspective.
53
+
54
+ CORE RULES:
55
+ - BOUNDARIES: NEVER speak, think, or generate actions for {{user}}.
56
+ - HISTORY & CONTEXT: Your reactions must logically follow past messages. Stay strictly in the present moment.
57
+ - PACING & DIALOGUE: Keep it slow-burn and grounded. Keep dialogue concise.
58
+ - FORMATTING: You must strictly follow the thought process format below, followed by a short roleplay response, and then STOP IMMEDIATELY. Output the <|im_end|> token.
59
+
60
+ Format your response EXACTLY like this:
61
+ <think>
62
+ 1. INTENT: [User's intent in 1 sentence]
63
+ 2. STATE: [Character's emotional state in 1 sentence]
64
+ 3. PLAN: I will write 1 to 2 action sentences and 1 dialogue sentence, then STOP if user message is small else if he is asking something detailed reply in more detail.
65
+ </think>
66
+ *Grounded action and environmental description.*
67
+ "Natural dialogue."
68
+ ```
69
+
70
+ ### Optimal Inference Settings
71
+ For the best text generation, it is strongly recommended to use a mild repetition penalty and a stopping criteria that intercepts `<|im_end|>`.
72
+
73
+ ```python
74
+ generation_kwargs = dict(
75
+ # ... your inputs ...
76
+ max_new_tokens=350, # Leash to prevent runaway generations
77
+ temperature=0.65, # Keep it grounded
78
+ repetition_penalty=1.1, # Punish loops like "A pause. A pause."
79
+ top_p=0.9,
80
+ do_sample=True,
81
+ pad_token_id=tokenizer.eos_token_id,
82
+ stopping_criteria=stopping_criteria # CRITICAL: This kills the thread on <|im_end|>
83
+ )
84
+ ```
85
+
86
+ ## Risks and Limitations
87
+
88
+ - **Complex Multi-Character Plots:** While the model heavily punches above its weight class (1.5B), exceptionally nuanced multi-character tracking or massive world-building tasks might still strain its parameter limits compared to an 8B+ model.
89
+ - **Inherited Base Model Biases:** Output behavior is still tethered to the foundational weights of Qwen2.5-1.5B.
90
+ - **Thinking Tags Extraction:** Ensure your frontend properly hides `<think>` ... `</think>` tags if you only wish to see the character's final verbal/action responses.
91
+
92
+ ## Responsible Usage
93
+
94
+ This model is focused extensively on fictional roleplay and creative writing. It is NOT intended to provide factual advice, conduct real-world real-time analysis, or generate non-fictional transcripts. Please use responsibly, keeping within ethical and licensing obligations of the model's lineage.
config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": null,
7
+ "torch_dtype": "float16",
8
+ "eos_token_id": 151643,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 1536,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 8960,
13
+ "layer_types": [
14
+ "full_attention",
15
+ "full_attention",
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention"
42
+ ],
43
+ "max_position_embeddings": 32768,
44
+ "max_window_layers": 28,
45
+ "model_type": "qwen2",
46
+ "num_attention_heads": 12,
47
+ "num_hidden_layers": 28,
48
+ "num_key_value_heads": 2,
49
+ "pad_token_id": 151665,
50
+ "rms_norm_eps": 1e-06,
51
+ "rope_parameters": {
52
+ "rope_theta": 1000000.0,
53
+ "rope_type": "default"
54
+ },
55
+ "sliding_window": null,
56
+ "tie_word_embeddings": true,
57
+ "unsloth_fixed": true,
58
+ "unsloth_version": "2026.4.4",
59
+ "use_cache": false,
60
+ "use_mrope": false,
61
+ "use_sliding_window": false,
62
+ "vocab_size": 151666
63
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65228bb1f1bd1a82326aa7557b4758b025bced9abfdb86221d185093c7227c58
3
+ size 3087467144
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd5948af71b4f56cf697f7580814c7ce8b80595ef985544efcacf716126a2e31
3
+ size 11422356
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": null,
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|endoftext|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [],
9
+ "is_local": true,
10
+ "model_max_length": 32768,
11
+ "pad_token": "<|PAD_TOKEN|>",
12
+ "padding_side": "left",
13
+ "split_special_tokens": false,
14
+ "tokenizer_class": "Qwen2Tokenizer",
15
+ "unk_token": null
16
+ }