Instructions to use aimeri/spoomplesmaxx-cardmaker-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aimeri/spoomplesmaxx-cardmaker-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aimeri/spoomplesmaxx-cardmaker-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("aimeri/spoomplesmaxx-cardmaker-v1")
model = AutoModelForMultimodalLM.from_pretrained("aimeri/spoomplesmaxx-cardmaker-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use aimeri/spoomplesmaxx-cardmaker-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aimeri/spoomplesmaxx-cardmaker-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aimeri/spoomplesmaxx-cardmaker-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/aimeri/spoomplesmaxx-cardmaker-v1

SGLang

How to use aimeri/spoomplesmaxx-cardmaker-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aimeri/spoomplesmaxx-cardmaker-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aimeri/spoomplesmaxx-cardmaker-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aimeri/spoomplesmaxx-cardmaker-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aimeri/spoomplesmaxx-cardmaker-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use aimeri/spoomplesmaxx-cardmaker-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aimeri/spoomplesmaxx-cardmaker-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aimeri/spoomplesmaxx-cardmaker-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for aimeri/spoomplesmaxx-cardmaker-v1 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="aimeri/spoomplesmaxx-cardmaker-v1",
    max_seq_length=2048,
)

Docker Model Runner
How to use aimeri/spoomplesmaxx-cardmaker-v1 with Docker Model Runner:
```
docker model run hf.co/aimeri/spoomplesmaxx-cardmaker-v1
```

spoomplesmaxx-cardmaker-v1 / README.md

aimeri

Update README.md

4c166f8 verified 4 days ago

preview code

raw

history blame contribute delete

6.34 kB

metadata

license: apache-2.0
base_model: ibm-granite/granite-4.1-8b-base
base_model_relation: finetune
datasets:
  - aimeri/st-characters-alpaca
language:
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - sillytavern
  - character-cards
  - character-card-generation
  - roleplay
  - granite
  - granite-4.1
  - unsloth
  - trl
  - sft
  - lora
  - conversational

SpoomplesMaxx Card Maker V1

A fine-tune of ibm-granite/granite-4.1-8b-base that turns a short, open-ended prompt into a complete SillyTavern character card. Give it a concept — an archetype, a name and a few constraints, or just a one-liner — and it generates a full V2/V3-style card (description, personality, scenario, first message, example messages, and sometimes a lorebook).

Model Details

Developed by: aimeri
Base model: ibm-granite/granite-4.1-8b-base (Apache 2.0)
Language: English
Finetuned from a base (not instruct) checkpoint so output is the card itself, with no assistant-style preamble, disclaimers, or refusals.
License: Apache 2.0

Uses

Direct Use

Generating SillyTavern-compatible character cards on demand from a natural-language request. The intended workflow is "describe a character, get a card," with the card output piped through a structural validator before import.

Out-of-Scope Use

This is a single-turn card generator, not a roleplay or chat model — the assistant turn is a static card definition, not a conversation. It is not intended for multi-turn roleplay, as a general-purpose assistant, or for factual question answering.

How to Get Started

The model was trained without a system prompt, so the cleanest usage is user-only. Use the chat template and sampling settings below.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer  # transformers >= 5.0

model_id = "aimeri/spoomplesmaxx-cardmaker-v1"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto")

messages = [
    {"role": "user", "content": "Create a character card for a grumpy lighthouse keeper."},
]
inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
).to(model.device)

out = model.generate(
    **inputs,
    max_new_tokens=8192,
    do_sample=True,
    temperature=1.0,
    top_k=64,
    top_p=0.95,
    repetition_penalty=1.1,
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Cards that include a character_book can be long; if generation cuts off mid-card, raise max_new_tokens. The merged 16-bit weights also serve directly under vLLM (vllm serve aimeri/spoomplesmaxx-cardmaker-v1), again with no system message.

Training Details

Procedure

LoRA fine-tune with Unsloth + TRL SFTTrainer, using the official Granite 4.1 chat template. Loss was computed on the assistant (card) completion only via train_on_responses_only.

LoRA configuration

Setting	Value
Rank `r`	16
`lora_alpha`	22
`lora_dropout`	0
Target modules	all-linear
Rank-stabilized LoRA	enabled
Bias	none

Training hyperparameters

Setting	Value
Epochs	2 (848 optimizer steps)
Per-device batch size	1
Gradient accumulation	8 (effective batch size 8)
Max sequence length	8192
Optimizer	adamw_8bit (β₁ 0.9, β₂ 0.999, ε 1e-8)
Learning rate	1e-4, cosine schedule
Warmup steps	25
Weight decay	0.001
Max grad norm	1.0
Precision	bf16
Seed	1985
Frameworks	Unsloth 2026.6.1, Transformers 5.5.0, TRL, PEFT, PyTorch 2.10

Results

Evaluation loss on the 5% held-out split fell from the base checkpoint to the final model over the two epochs (most of the gain came in the first ~100 steps, with a slow grind afterward):

Checkpoint	Eval loss
Base (step 0, `eval_on_start`)	2.234
Step 100	1.704
Step 400	1.656
Final (step 848)	1.641

Final mean training loss was ~1.57. Total wall-clock training time was ~4.6 hours.

Evaluation

Quality was judged primarily behaviorally rather than by a single metric — eval loss is a weak proxy for card quality on a held-out set this small (~178 rows). A fixed prompt battery probed the behaviors that matter for this task:

Structure & completeness — clean, parseable cards with all expected fields on easy archetypes.
Constraint adherence — exact name / age / occupation, and a character's voice actually showing up in first_mes and mes_example rather than drifting generic.
Sparse invention — building a full, internally consistent card from a near-empty prompt.
First-message craft — second-person address to {{user}}, scene-setting, action formatting, in-voice dialogue, and a natural hand-off.
Register — antagonist/villain cards produced in-character, with no disclaimers, moralizing, or assistant-voice leakage. This is the main reason the model was trained from a base rather than an instruct checkpoint.

Bias, Risks, and Limitations

Mature content. This model was trained with a mix of Safe for Work and Not Safe For Work cards, and it may generate objectionable content. Please use discretion when generating new cards.
Structural validity is not guaranteed. Output is generated text, not schema-validated card JSON. Run it through a parser/validator before importing into SillyTavern.
Card conventions. Output uses {{user}} / {{char}} macros and assumes a SillyTavern runtime.
Single-turn only. This generates a card, not a conversation; it is not itself a roleplay partner.
Inherited bias. The model carries the biases of both the base model and the curated card sources, including their genre, aesthetic, and demographic skew. "High quality" reflects a subjective curation judgment.

Citation

If you use this model, please reference this repository and the base model.