--- license: apache-2.0 language: - en base_model: - LatitudeGames/Muse-12B tags: - text adventure - roleplay - nvfp4 - tensorrt-llm model_size: 12B datasets: - agentlans/distilled-roleplay pipeline_tag: text-generation --- ![image/jpeg](muse.jpg) # Muse-12B-NVFP4 Quantized NVFP4 weights of the [Muse-12B](https://huggingface.co/LatitudeGames/Muse-12B) model, for use with nVidia Blackwell GPUs. ## Quantization details Quantized with TensorRT-Model-Optimizer 0.37.0 Calibrated using the [distilled-roleplay](https://huggingface.co/datasets/agentlans/distilled-roleplay) dataset, tagged in the same ChatML format used to train the Wayfarer and Muse models in the first place. This was accomplished by adding the following code to the start of `hf_ptq.py`: ``` from modelopt.torch.utils import dataset_utils dataset_utils.SUPPORTED_DATASET_CONFIG["distilled-roleplay"] = { "config": { "path": "agentlans/distilled-roleplay", "split": ["train"], }, "preprocess": lambda sample: "".join( f"<|im_start|>{ {'system':'system','human':'user','gpt':'assistant'}[turn['from']] }\n" f"{turn['value'].strip()}<|im_end|>\n" for turn in sample["conversations"] ), } ``` ## Inference Tested on a RTX 5060 Ti 16GB with TensorRT-LLM, vLLM, SGLang, and Aphrodite Engine. Recommended generation settings (a mix of what it says on the Muse-12B model card and the [AI Dungeon Model Guide](https://help.aidungeon.com/ai-models-and-their-differences)): - Temperature: 1.0 - Top K: 250 - Top P: 1 - Min P: 0.025 - Repetition Penalty: 1.05 - Presence Penalty: 0.25 ## Prompt Format As mentioned above, the calibration data was provided with the same ChatML tags as had been used to finetune Latitude's 12B models: ``` <|im_start|>system You're a masterful storyteller and gamemaster. Write in second person present tense (You are), crafting vivid, engaging narratives with authority and confidence.<|im_end|> <|im_start|>user > You peer into the darkness.<|im_end|> <|im_start|>assistant You have been eaten by a grue.<|im_end|> ``` As such, I would recommend using that format for inference. ## Credits Muse-12B was made by [Latitude Games](https://huggingface.co/LatitudeGames) with help from [Gryphe Padar](https://huggingface.co/Gryphe)