aimeri commited on
Commit
da89015
·
verified ·
1 Parent(s): fc7c357

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -0
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: Qwen/Qwen3-14B-Base
4
+ tags:
5
+ - cpt
6
+ - continued-pretraining
7
+ - roleplay
8
+ - creative-writing
9
+ - character-cards
10
+ - fiction
11
+ datasets:
12
+ - nyuuzyou/fandom
13
+ - gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img
14
+ language:
15
+ - en
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # SpoomplesMaxx Base
20
+
21
+ A continued pre-training (CPT) checkpoint of [Qwen/Qwen3-14B-Base](https://huggingface.co/Qwen/Qwen3-14B-Base) fine-tuned on creative writing and roleplay data.
22
+
23
+ ## Model Description
24
+
25
+ This model is part of the SpoomplesMaxx training pipeline: **CPT → SFT → DPO**
26
+
27
+ The CPT stage teaches the model:
28
+ - Character understanding and portrayal
29
+ - Creative fiction writing patterns
30
+ - Fandom/wiki-style lore knowledge
31
+ - Dialogue patterns for roleplay
32
+
33
+ ## Training Data
34
+
35
+ ### Phase 1: Core Knowledge
36
+
37
+ This checkpoint was trained on data focused on character knowledge and lore:
38
+
39
+ | Dataset | Source | Samples | Description |
40
+ |---------|--------|---------|-------------|
41
+ | **Private Dataset** | Private | ~100k (50,000 sampled) | SillyTavern-style character cards with personality, scenario, and example dialogue, as well as fanfics, essays about media and characters, short novels, and high quality roleplay data |
42
+ | [nyuuzyou/fandom](https://huggingface.co/datasets/nyuuzyou/fandom) | HuggingFace | 50,000 (sampled) | Fandom wiki articles with character/world lore |
43
+ | [gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img](https://huggingface.co/datasets/gryffindor-ISWS/dbpedia_abstracts_fictional_characters_with_img) | HuggingFace | 50,000 (sampled) | DBpedia abstracts of fictional characters |
44
+
45
+ **Total training samples:** ~46k
46
+
47
+
48
+ ## Training Configuration
49
+
50
+ | Parameter | Value |
51
+ |-----------|-------|
52
+ | Base Model | `Qwen/Qwen3-14B-Base` |
53
+ | Training Phase | Phase 1 (Core Knowledge) |
54
+ | Steps | 1000 / 3000 |
55
+ | Batch Size | 1 |
56
+ | Gradient Accumulation | 16 |
57
+ | **Effective Batch Size** | **16** |
58
+ | Learning Rate | 1e-5 |
59
+ | LR Scheduler | Cosine |
60
+ | Warmup Ratio | 5% |
61
+ | Max Sequence Length | 8192 |
62
+ | Precision | BF16 |
63
+ | Optimizer | 8-bit Paged AdamW |
64
+ | Gradient Checkpointing | ✓ |
65
+ | Priority Repeat | 50× (character cards) |
66
+
67
+ ### Hardware
68
+
69
+ - **GPU:** 1× NVIDIA A800
70
+ - **Training Time:** ~6 hours for 1000 steps
71
+
72
+
73
+ ## Intended Use
74
+
75
+ This model is intended for use as a creative base model for further finetuning.
76
+
77
+ ### Not Recommended For:
78
+ - Production deployment (use final model after full CPT → SFT → DPO pipeline)
79
+ - Direct chat/instruction following (this is a base model continuation, not instruction-tuned)
80
+
81
+ ## Limitations
82
+
83
+ - **No instruction tuning:** This model continues raw text, not chat/instructions
84
+ - **Private data bias:** Heavy weighting toward private character cards may introduce specific character patterns
85
+ - **NSFW content:** Training data includes creative fiction that may contain mature themes. No safety filtering was applied at this stage.
86
+
87
+ ## How to Use
88
+
89
+ ```python
90
+ from transformers import AutoModelForCausalLM, AutoTokenizer
91
+
92
+ model = AutoModelForCausalLM.from_pretrained(
93
+ "aimeri/SpoomplesMaxx-CPT-3-Base",
94
+ dtype="auto",
95
+ device_map="auto",
96
+ )
97
+ tokenizer = AutoTokenizer.from_pretrained("aimeri/SpoomplesMaxx-CPT-3-Base")
98
+
99
+ # CPT models continue text, not chat
100
+ prompt = "The castle stood silent against the darkening sky, its towers reaching toward clouds that promised rain. Inside,"
101
+
102
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
103
+ outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.8)
104
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
105
+ ```
106
+
107
+ ## Citation
108
+
109
+ If you use this model, please cite the base model and datasets:
110
+
111
+ ```bibtex
112
+ @misc{qwen3-14b-base,
113
+ title={Qwen3-14B-Base},
114
+ author={Qwen Team},
115
+ year={2025},
116
+ publisher={Hugging Face},
117
+ url={https://huggingface.co/Qwen/Qwen3-14B-Base}
118
+ }
119
+ ```
120
+
121
+ ## Acknowledgments
122
+
123
+ - [Qwen Team](https://huggingface.co/Qwen) for the excellent base model
124
+ - [nyuuzyou](https://huggingface.co/nyuuzyou) for the Fandom wiki dataset
125
+ - [Archive of Our Own](https://archiveofourown.org/) for creative fiction
126
+
127
+ ---