DataSnake
/

Muse-12B-NVFP4

Text Generation

8-bit precision

Model card Files Files and versions

Muse-12B-NVFP4 / README.md

DataSnake's picture

Update README.md

1b56a4f verified 7 days ago

|

history blame contribute delete

2.29 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- LatitudeGames/Muse-12B
	tags:
	- text adventure
	- roleplay
	- nvfp4
	- tensorrt-llm
	model_size: 12B
	datasets:
	- agentlans/distilled-roleplay
	pipeline_tag: text-generation
	---

	![image/jpeg](muse.jpg)

	# Muse-12B-NVFP4

	Quantized NVFP4 weights of the [Muse-12B](https://huggingface.co/LatitudeGames/Muse-12B) model, for use with nVidia Blackwell GPUs.

	## Quantization details

	Quantized with TensorRT-Model-Optimizer 0.37.0

	Calibrated using the [distilled-roleplay](https://huggingface.co/datasets/agentlans/distilled-roleplay) dataset, tagged in the same ChatML format used to train the Wayfarer and Muse models in the first place. This was accomplished by adding the following code to the start of `hf_ptq.py`:

	```
	from modelopt.torch.utils import dataset_utils

	dataset_utils.SUPPORTED_DATASET_CONFIG["distilled-roleplay"] = {
	"config": {
	"path": "agentlans/distilled-roleplay",
	"split": ["train"],
	},
	"preprocess": lambda sample: "".join(
	f"<\|im_start\|>{ {'system':'system','human':'user','gpt':'assistant'}[turn['from']] }\n"
	f"{turn['value'].strip()}<\|im_end\|>\n"
	for turn in sample["conversations"]
	),
	}
	```

	## Inference

	Tested on a RTX 5060 Ti 16GB with TensorRT-LLM, vLLM, SGLang, and Aphrodite Engine.

	Recommended generation settings (a mix of what it says on the Muse-12B model card and the [AI Dungeon Model Guide](https://help.aidungeon.com/ai-models-and-their-differences)):
	- Temperature: 1.0
	- Top K: 250
	- Top P: 1
	- Min P: 0.025
	- Repetition Penalty: 1.05
	- Presence Penalty: 0.25

	## Prompt Format

	As mentioned above, the calibration data was provided with the same ChatML tags as had been used to finetune Latitude's 12B models:
	```
	<\|im_start\|>system
	You're a masterful storyteller and gamemaster. Write in second person present tense (You are), crafting vivid, engaging narratives with authority and confidence.<\|im_end\|>
	<\|im_start\|>user
	> You peer into the darkness.<\|im_end\|>
	<\|im_start\|>assistant
	You have been eaten by a grue.<\|im_end\|>
	```
	As such, I would recommend using that format for inference.

	## Credits

	Muse-12B was made by [Latitude Games](https://huggingface.co/LatitudeGames) with help from [Gryphe Padar](https://huggingface.co/Gryphe)