|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- LatitudeGames/Muse-12B |
|
|
tags: |
|
|
- text adventure |
|
|
- roleplay |
|
|
- nvfp4 |
|
|
- tensorrt-llm |
|
|
model_size: 12B |
|
|
datasets: |
|
|
- agentlans/distilled-roleplay |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
# Muse-12B-NVFP4 |
|
|
|
|
|
Quantized NVFP4 weights of the [Muse-12B](https://huggingface.co/LatitudeGames/Muse-12B) model, for use with nVidia Blackwell GPUs. |
|
|
|
|
|
## Quantization details |
|
|
|
|
|
Quantized with TensorRT-Model-Optimizer 0.37.0 |
|
|
|
|
|
Calibrated using the [distilled-roleplay](https://huggingface.co/datasets/agentlans/distilled-roleplay) dataset, tagged in the same ChatML format used to train the Wayfarer and Muse models in the first place. This was accomplished by adding the following code to the start of `hf_ptq.py`: |
|
|
|
|
|
``` |
|
|
from modelopt.torch.utils import dataset_utils |
|
|
|
|
|
dataset_utils.SUPPORTED_DATASET_CONFIG["distilled-roleplay"] = { |
|
|
"config": { |
|
|
"path": "agentlans/distilled-roleplay", |
|
|
"split": ["train"], |
|
|
}, |
|
|
"preprocess": lambda sample: "".join( |
|
|
f"<|im_start|>{ {'system':'system','human':'user','gpt':'assistant'}[turn['from']] }\n" |
|
|
f"{turn['value'].strip()}<|im_end|>\n" |
|
|
for turn in sample["conversations"] |
|
|
), |
|
|
} |
|
|
``` |
|
|
|
|
|
## Inference |
|
|
|
|
|
Tested on a RTX 5060 Ti 16GB with TensorRT-LLM, vLLM, SGLang, and Aphrodite Engine. |
|
|
|
|
|
Recommended generation settings (a mix of what it says on the Muse-12B model card and the [AI Dungeon Model Guide](https://help.aidungeon.com/ai-models-and-their-differences)): |
|
|
- Temperature: 1.0 |
|
|
- Top K: 250 |
|
|
- Top P: 1 |
|
|
- Min P: 0.025 |
|
|
- Repetition Penalty: 1.05 |
|
|
- Presence Penalty: 0.25 |
|
|
|
|
|
## Prompt Format |
|
|
|
|
|
As mentioned above, the calibration data was provided with the same ChatML tags as had been used to finetune Latitude's 12B models: |
|
|
``` |
|
|
<|im_start|>system |
|
|
You're a masterful storyteller and gamemaster. Write in second person present tense (You are), crafting vivid, engaging narratives with authority and confidence.<|im_end|> |
|
|
<|im_start|>user |
|
|
> You peer into the darkness.<|im_end|> |
|
|
<|im_start|>assistant |
|
|
You have been eaten by a grue.<|im_end|> |
|
|
``` |
|
|
As such, I would recommend using that format for inference. |
|
|
|
|
|
## Credits |
|
|
|
|
|
Muse-12B was made by [Latitude Games](https://huggingface.co/LatitudeGames) with help from [Gryphe Padar](https://huggingface.co/Gryphe) |