umbra / README.md
voidai-research's picture
Update README.md
c17b495 verified
---
language:
- en
license: apache-2.0
base_model: unsloth/Mistral-Small-3.2-24B-Instruct-2506
library_name: transformers
tags:
- roleplay
- creative-writing
- chat
- mistral3
- vllm
- transformers
- lora
- trl
- peft
---
# Umbra
Umbra is a **roleplay-first** chat model fine-tuned from **unsloth/Mistral-Small-3.2-24B-Instruct-2506**. It is optimized for **immersive narration**, strong **character voice**, and **scene momentum**.
> TL;DR: This is a creative RP model. If you want a general assistant, consider the base model instead.
## What’s in this repo
This repository contains a **merged checkpoint** where LoRA weights were merged into the base model weights. The repository also includes the tokenizer snapshot and configuration files used during training.
Key artifacts included:
* Model weight shards (`model-00001-of-00010.safetensors``model-00010-of-00010.safetensors`)
* `model.safetensors.index.json`
* Tokenizer snapshot (`tokenizer.json`, `special_tokens_map.json`, `tokenizer_config.json`)
* Generation config (`generation_config.json`)
* Training configuration snapshot (`config.json`)
The weights are provided in **safetensors format** and are compatible with **Transformers and vLLM**.
---
# Intended use
Umbra is designed for:
* Immersive roleplay
* Creative writing / character dialogue
* Narrative scene continuation
---
# Not recommended for
Umbra is **not intended** for:
* High‑stakes domains (medical, legal, financial)
* Factual Q&A requiring citations or browsing
* Safety‑critical use cases
---
# Content warning
Umbra is trained on roleplay‑style conversational data and may produce **mature or intense themes** depending on prompts. Use appropriate moderation and filtering if deploying publicly.
---
# Prompting
Umbra follows a **Mistral‑style instruction format** and works well with short system prompts. It can be served via **vLLM’s OpenAI‑compatible API** or used directly with **Transformers**.
### Roleplay system prompt (starter)
Use a short system prompt and put character/world constraints in the user message or in your UI’s lorebook system.
Example:
**System**
“You are Umbra. Stay in‑character. Do not write the user’s dialogue or actions. Keep responses vivid and scene‑grounded.”
**User**
Provide scene description, character context, and formatting rules.
### Avoid common RP failure modes
**Repetition / copy‑paste loops**
* reduce `temperature`
* reduce `max_tokens`
* add an explicit constraint such as:
"Do not repeat phrases or paraphrase the previous paragraph."
**Writing for the user**
Add a hard constraint:
"Never write my character’s dialogue or actions."
---
# Recommended generation settings
These are stable defaults for roleplay workloads:
* `temperature`: 0.65–0.9
* `top_p`: 0.85–0.95
* `repetition_penalty`: 1.03–1.10
* `max_tokens`: tuned to your UI’s desired reply length
If your stack supports **top_k**, keep it moderate (`top_k` ≈ 0–100). Very aggressive penalties can destabilize sampling.
---
# Context length
The underlying model family supports **long‑context inference**, but practical limits depend on KV‑cache memory and serving infrastructure.
Recommended starting ranges:
**8k–16k tokens**
Increase context length gradually depending on GPU memory availability and KV‑cache limits in your serving stack.
---
# Training details
## Base model
* **unsloth/Mistral-Small-3.2-24B-Instruct-2506**
The Unsloth variant provides optimized loading and training compatibility with the **Transformers / TRL / PEFT** stack.
## Fine‑tuning method
Umbra was trained using **LoRA supervised fine‑tuning (SFT)** and the LoRA weights were **merged into the base model** for inference distribution.
Typical LoRA configuration:
```
r = 16
alpha = 32
dropout = 0.05
```
Target modules:
```
q_proj
k_proj
v_proj
o_proj
gate_proj
up_proj
down_proj
```
These modules correspond to the primary attention and MLP projection layers of the Mistral architecture.
---
# SFT training run (observed)
```
epochs: 6
max_seq_len: 4096
per_device_batch_size: 1
grad_accumulation: 4
total_steps: 13374
```
Approximate training tokens processed:
```
~166M tokens
```
Training was performed using the **Transformers + TRL + PEFT** stack.
---
# DPO (planned / optional)
A preference dataset has been prepared in **{prompt, chosen, rejected}** format for future **Direct Preference Optimization (DPO)** training.
Goals of the DPO stage:
* reduce repetition
* improve instruction adherence
* reduce user‑character hijacking
Future releases may include DPO‑refined checkpoints.
---
# Data
Umbra was trained on a mixture of:
1. **Roleplay SFT data** in multi‑turn conversation format (character cards + scene turns)
2. **Instruction‑style SFT data** mixed in at roughly **10–30% of tokens** to preserve instruction‑following behavior
3. **Preference pairs** generated for DPO refinement
### Synthetic teacher generation
Preference pairs and instruct samples may be generated using a **teacher model** (for example via OpenRouter).
Teacher models may run with internal reasoning enabled, but **only final responses are stored** in the dataset. No chain‑of‑thought traces are retained.
---
# Evaluation
This release is evaluated primarily through **qualitative roleplay testing**:
Evaluation criteria:
* character consistency
* scene grounding
* multi‑turn narrative coherence
* adherence to out‑of‑character constraints
Known failure modes:
* repetition during very long generations
* occasional attempts to control the user character
* weaker formatting for strict multi‑character dialogue unless explicitly prompted
These issues are typical targets for **DPO refinement**.
---
# Usage
## vLLM (recommended)
Serve locally:
```bash
vllm serve voidai-research/umbra \
--tokenizer_mode mistral \
--config_format mistral \
--load_format mistral \
--dtype bfloat16 \
--max-model-len 8192 \
--host 0.0.0.0 --port 8000 \
--served-model-name umbra
```
Example request:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "umbra",
"messages": [
{"role": "system", "content": "You are Umbra. Stay in-character. Do not write the user’s dialogue or actions."},
{"role": "user", "content": "Write a vivid RP response to this scene: ..."}
],
"temperature": 0.8,
"top_p": 0.92,
"max_tokens": 500
}'
```
---
## Transformers (Python)
> Depending on your Transformers version, `AutoModelForCausalLM` may not recognize the Mistral3 configuration. In that case, import the Mistral3 model class directly.
```python
import torch
from transformers import AutoTokenizer
from transformers.models.mistral3.modeling_mistral3 import Mistral3ForConditionalGeneration
model_id = "<YOUR_HF_USERNAME>/umbra"
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
prompt = "<s>[INST]You are Umbra.\n\nWrite a vivid RP reply: ...[/INST]"
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=400,
temperature=0.8,
top_p=0.92,
do_sample=True,
)
print(tok.decode(out[0], skip_special_tokens=True))
```
---
# License
Umbra is released under **Apache‑2.0**, consistent with the base model license.
---
# Acknowledgements
* Base model: **unsloth/Mistral-Small-3.2-24B-Instruct-2506**
* Training stack: **Transformers / TRL / PEFT**
* Serving stack: **vLLM + mistral_common tokenizer stack**
---
# Citation
If you reference this model in a project, please cite the repository and the base model.
---
# API Access
Umbra can also be integrated through external API gateways.
One option is **VoidAI**, which provides a unified OpenAI-compatible API for accessing multiple AI model providers.
https://voidai.app
Example:
```python
from openai import OpenAI
client = OpenAI(
api_key="sk-voidai-your_key_here",
base_url="https://api.voidai.app/v1"
)
response = client.chat.completions.create(
model="umbra",
messages=[
{"role": "user", "content": "Write a fantasy RP scene."}
]
)
print(response.choices[0].message.content)
```
Documentation:
[https://docs.voidai.app](https://docs.voidai.app)