DataSnake commited on
Commit
7cf2b0a
·
verified ·
1 Parent(s): 2f4bb7a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -13
README.md CHANGED
@@ -10,30 +10,64 @@ tags:
10
  - nvfp4
11
  - tensorrt-llm
12
  model_size: 12B
 
 
 
13
  ---
14
 
15
  ![image/jpeg](muse.jpg)
16
 
17
  # Muse-12B
18
 
19
- Quantized NVFP4 weights of the [Muse-12B](https://huggingface.co/LatitudeGames/Muse-12B) model.
 
 
20
 
21
  Quantized with TensorRT-Model-Optimizer 0.37.0
22
 
23
- Calibrated using the [distilled-roleplay](https://huggingface.co/datasets/agentlans/distilled-roleplay) dataset, tagged in the same ChatML format used to train the Wayfarer and Muse models in the first place. This was accomplished by adding the following code to `SUPPORTED_DATASET_CONFIG` inside dataset_utils.py:
24
 
25
  ```
26
- "distilled-roleplay": {
27
- "config": {
28
- "path": "agentlans/distilled-roleplay",
29
- "split": ["train"],
30
- },
31
- "preprocess": lambda sample: "".join(
32
- f"<|im_start|>{ {'system':'system','human':'user','gpt':'assistant'}[turn['from']] }\n"
33
- f"{turn['value'].strip()}<|im_end|>\n"
34
- for turn in sample["conversations"]
35
- ),
36
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ```
 
 
 
 
 
 
 
 
 
 
38
 
39
- Tested on TensorRT-LLM on a RTX 5060 Ti.
 
10
  - nvfp4
11
  - tensorrt-llm
12
  model_size: 12B
13
+ datasets:
14
+ - agentlans/distilled-roleplay
15
+ pipeline_tag: text-generation
16
  ---
17
 
18
  ![image/jpeg](muse.jpg)
19
 
20
  # Muse-12B
21
 
22
+ Quantized NVFP4 weights of the [Muse-12B](https://huggingface.co/LatitudeGames/Muse-12B) model, for use with nVidia Blackwell GPUs.
23
+
24
+ ## Quantization details
25
 
26
  Quantized with TensorRT-Model-Optimizer 0.37.0
27
 
28
+ Calibrated using the [distilled-roleplay](https://huggingface.co/datasets/agentlans/distilled-roleplay) dataset, tagged in the same ChatML format used to train the Wayfarer and Muse models in the first place. This was accomplished by adding the following code to the start of `hf_ptq.py`:
29
 
30
  ```
31
+ import modelopt.torch.utils import dataset_utils
32
+
33
+ dataset_utils.SUPPORTED_DATASET_CONFIG["distilled-roleplay"] = {
34
+ "config": {
35
+ "path": "agentlans/distilled-roleplay",
36
+ "split": ["train"],
 
 
 
 
37
  },
38
+ "preprocess": lambda sample: "".join(
39
+ f"<|im_start|>{ {'system':'system','human':'user','gpt':'assistant'}[turn['from']] }\n"
40
+ f"{turn['value'].strip()}<|im_end|>\n"
41
+ for turn in sample["conversations"]
42
+ ),
43
+ }
44
+ ```
45
+
46
+ ## Inference
47
+
48
+ Tested on a RTX 5060 Ti 16GB with TensorRT-LLM, vLLM, and SGLang. Of the three, I found vLLM to be the best. TensorRT-LLM couldn't handle as large a context window as the other two, and SGLang had fewer sampling options available.
49
+
50
+ Recommended generation settings (a mix of what it says on the Muse-12B model card and the [AI Dungeon Model Guide](https://help.aidungeon.com/ai-models-and-their-differences)):
51
+ - Temperature: 1.0
52
+ - Top K: 250
53
+ - Top P: 1
54
+ - Min P: 0.025
55
+ - Repetition Penalty: 1.05
56
+ - Presence Penalty: 0.25
57
+
58
+ ## Prompt Format
59
+
60
+ As mentioned above, the calibration data was provided with the same ChatML tags as had been used to finetune Latitude's 12B models:
61
  ```
62
+ <|im_start|>system
63
+ You're a masterful storyteller and gamemaster. Write in second person present tense (You are), crafting vivid, engaging narratives with authority and confidence.<|im_end|>
64
+ <|im_start|>user
65
+ > You peer into the darkness.<|im_end|>
66
+ <|im_start|>assistant
67
+ You have been eaten by a grue.<|im_end|>
68
+ ```
69
+ As such, I would recommend using that format for inference.
70
+
71
+ ## Credits
72
 
73
+ Muse-12B was made by [Latitude Games](https://huggingface.co/LatitudeGames) with help from [Gryphe Padar](https://huggingface.co/Gryphe)