Instructions to use aimeri/spoomplesmaxx-cardmaker-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aimeri/spoomplesmaxx-cardmaker-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aimeri/spoomplesmaxx-cardmaker-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("aimeri/spoomplesmaxx-cardmaker-v1")
model = AutoModelForMultimodalLM.from_pretrained("aimeri/spoomplesmaxx-cardmaker-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use aimeri/spoomplesmaxx-cardmaker-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aimeri/spoomplesmaxx-cardmaker-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aimeri/spoomplesmaxx-cardmaker-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/aimeri/spoomplesmaxx-cardmaker-v1

SGLang

How to use aimeri/spoomplesmaxx-cardmaker-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aimeri/spoomplesmaxx-cardmaker-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aimeri/spoomplesmaxx-cardmaker-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aimeri/spoomplesmaxx-cardmaker-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aimeri/spoomplesmaxx-cardmaker-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use aimeri/spoomplesmaxx-cardmaker-v1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aimeri/spoomplesmaxx-cardmaker-v1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aimeri/spoomplesmaxx-cardmaker-v1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for aimeri/spoomplesmaxx-cardmaker-v1 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="aimeri/spoomplesmaxx-cardmaker-v1",
    max_seq_length=2048,
)

Docker Model Runner
How to use aimeri/spoomplesmaxx-cardmaker-v1 with Docker Model Runner:
```
docker model run hf.co/aimeri/spoomplesmaxx-cardmaker-v1
```

aimeri commited on 4 days ago

Commit

9daed53

verified ·

1 Parent(s): 42c67a0

Update README.md

Browse files

Files changed (1) hide show

README.md +139 -13

README.md CHANGED Viewed

@@ -1,21 +1,147 @@
----
-base_model: ibm-granite/granite-4.1-8b-base
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- granite
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded finetuned  model
-- **Developed by:** aimeri
-- **License:** apache-2.0
-- **Finetuned from model :** ibm-granite/granite-4.1-8b-base
-This granite model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

+--
 license: apache-2.0
+base_model: ibm-granite/granite-4.1-8b-base
+base_model_relation: finetune
+datasets:
+- aimeri/st-characters-alpaca
 language:
 - en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- sillytavern
+- character-cards
+- character-card-generation
+- roleplay
+- granite
+- granite-4.1
+- unsloth
+- trl
+- sft
+- lora
+- conversational
 ---
+# SpoomplesMaxx Card Maker V1
+A fine-tune of [`ibm-granite/granite-4.1-8b-base`](https://huggingface.co/ibm-granite/granite-4.1-8b-base) that turns a short, open-ended prompt into a complete [SillyTavern](https://github.com/SillyTavern/SillyTavern) character card. Give it a concept — an archetype, a name and a few constraints, or just a one-liner — and it generates a full V2/V3-style card (description, personality, scenario, first message, example messages, and sometimes a lorebook).
+## Model Details
+- **Developed by:** [aimeri](https://huggingface.co/aimeri)
+- **Base model:** [`ibm-granite/granite-4.1-8b-base`](https://huggingface.co/ibm-granite/granite-4.1-8b-base) (Apache 2.0)
+- **Language:** English
+- **Finetuned from a base (not instruct) checkpoint** so output is the card itself, with no assistant-style preamble, disclaimers, or refusals.
+- **License:** Apache 2.0
+## Uses
+### Direct Use
+Generating SillyTavern-compatible character cards on demand from a natural-language request. The intended workflow is "describe a character, get a card," with the card output piped through a structural validator before import.
+### Out-of-Scope Use
+This is a single-turn card *generator*, not a roleplay or chat model — the assistant turn is a static card definition, not a conversation. It is not intended for multi-turn roleplay, as a general-purpose assistant, or for factual question answering.
+## How to Get Started
+The model was trained **without a system prompt**, so the cleanest usage is user-only. Use the chat template and sampling settings below.
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer  # transformers >= 5.0
+model_id = "aimeri/spoomplesmaxx-cardmaker-v1"
+tok = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto")
+messages = [
+    {"role": "user", "content": "Create a character card for a grumpy lighthouse keeper."},
+]
+inputs = tok.apply_chat_template(
+    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
+).to(model.device)
+out = model.generate(
+    **inputs,
+    max_new_tokens=8192,
+    do_sample=True,
+    temperature=1.0,
+    top_k=64,
+    top_p=0.95,
+    repetition_penalty=1.1,
+)
+print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
+```
+Cards that include a `character_book` can be long; if generation cuts off mid-card, raise `max_new_tokens`. The merged 16-bit weights also serve directly under vLLM (`vllm serve aimeri/spoomplesmaxx-cardmaker-v1`), again with no system message.
+## Training Details
+### Procedure
+LoRA fine-tune with [Unsloth](https://github.com/unslothai/unsloth) + TRL `SFTTrainer`, using the official Granite 4.1 chat template. Loss was computed on the assistant (card) completion only via `train_on_responses_only`.
+**LoRA configuration**
+| Setting | Value |
+|---|---|
+| Rank `r` | 16 |
+| `lora_alpha` | 22 |
+| `lora_dropout` | 0 |
+| Target modules | all-linear |
+| Rank-stabilized LoRA | enabled |
+| Bias | none |
+**Training hyperparameters**
+| Setting | Value |
+|---|---|
+| Epochs | 2 (848 optimizer steps) |
+| Per-device batch size | 1 |
+| Gradient accumulation | 8 (effective batch size 8) |
+| Max sequence length | 8192 |
+| Optimizer | adamw_8bit (β₁ 0.9, β₂ 0.999, ε 1e-8) |
+| Learning rate | 1e-4, cosine schedule |
+| Warmup steps | 25 |
+| Weight decay | 0.001 |
+| Max grad norm | 1.0 |
+| Precision | bf16 |
+| Seed | 1985 |
+| Frameworks | Unsloth 2026.6.1, Transformers 5.5.0, TRL, PEFT, PyTorch 2.10 |
+### Results
+Evaluation loss on the 5% held-out split fell from the base checkpoint to the final model over the two epochs (most of the gain came in the first ~100 steps, with a slow grind afterward):
+| Checkpoint | Eval loss |
+|---|---|
+| Base (step 0, `eval_on_start`) | 2.234 |
+| Step 100 | 1.704 |
+| Step 400 | 1.656 |
+| Final (step 848) | **1.641** |
+Final mean training loss was ~1.57. Total wall-clock training time was ~4.6 hours.
+## Evaluation
+Quality was judged primarily **behaviorally** rather than by a single metric — eval loss is a weak proxy for card quality on a held-out set this small (~178 rows). A fixed prompt battery probed the behaviors that matter for this task:
+- **Structure & completeness** — clean, parseable cards with all expected fields on easy archetypes.
+- **Constraint adherence** — exact name / age / occupation, and a character's voice actually showing up in `first_mes` and `mes_example` rather than drifting generic.
+- **Sparse invention** — building a full, internally consistent card from a near-empty prompt.
+- **First-message craft** — second-person address to `{{user}}`, scene-setting, action formatting, in-voice dialogue, and a natural hand-off.
+- **Register** — antagonist/villain cards produced in-character, with no disclaimers, moralizing, or assistant-voice leakage. This is the main reason the model was trained from a base rather than an instruct checkpoint.
+## Bias, Risks, and Limitations
+- **Mature content.** This model was trained with a mix of Safe for Work and Not Safe For Work cards, and it may generate objectionable content. Please use discretion when generating new cards.
+- **Structural validity is not guaranteed.** Output is generated text, not schema-validated card JSON. Run it through a parser/validator before importing into SillyTavern.
+- **Card conventions.** Output uses `{{user}}` / `{{char}}` macros and assumes a SillyTavern runtime.
+- **Single-turn only.** This generates a card, not a conversation; it is not itself a roleplay partner.
+- **Inherited bias.** The model carries the biases of both the base model and the curated card sources, including their genre, aesthetic, and demographic skew. "High quality" reflects a subjective curation judgment.
+## Citation
+If you use this model, please reference this repository and the [base model](https://huggingface.co/ibm-granite/granite-4.1-8b-base).