Instructions to use limloop/whiff-mamba2-20M-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use limloop/whiff-mamba2-20M-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="limloop/whiff-mamba2-20M-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("limloop/whiff-mamba2-20M-v2")
model = AutoModelForCausalLM.from_pretrained("limloop/whiff-mamba2-20M-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use limloop/whiff-mamba2-20M-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "limloop/whiff-mamba2-20M-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "limloop/whiff-mamba2-20M-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/limloop/whiff-mamba2-20M-v2

SGLang

How to use limloop/whiff-mamba2-20M-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "limloop/whiff-mamba2-20M-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "limloop/whiff-mamba2-20M-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "limloop/whiff-mamba2-20M-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "limloop/whiff-mamba2-20M-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use limloop/whiff-mamba2-20M-v2 with Docker Model Runner:
```
docker model run hf.co/limloop/whiff-mamba2-20M-v2
```

WHIFF 20M

🇷🇺 Русский...

Змеиный щепот в кустах, движимый легким порывом ветра

whiff-20M — это небольшая экспериментальная языковая модель на архитектуре Mamba2 с 20.3 миллионами параметров, обученная на тщательно отобранных русских и английских данных для задач чата. Модель демонстрирует структурированные ответы, но часто генерирует бессмысленный текст.

Технические детали

Архитектура: Mamba2ForCausalLM из 🤗 Transformers
Параметры: 20.3M
Языки: русский/английский (двуязычная)
Токенизатор: (специальный мини-BPE токенизатор)
Лицензия: Apache 2.0

Конфигурация модели

Mamba2Config(
    vocab_size=8192,
    hidden_size=512,
    state_size=64,
    num_heads=12,
    num_hidden_layers=9,
    conv_kernel=4,
    expand=1.5,
    n_groups=2
)

Использование

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("limloop/whiff-mamba2-20M")
model = AutoModelForCausalLM.from_pretrained("limloop/whiff-mamba2-20M")

def chat(messages, temp=0.5):
    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
    
    outputs = model.generate(
        inputs,
        max_length=512,
        top_k=40,
        top_p=0.9,
        repetition_penalty=1.1,
        num_return_sequences=1,
        temperature=temp,
        do_sample=True,
        eos_token_id=1
    )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Пример
dialog = [
    {"role": "system", "content": "Ты — мудрый эльф."},
    {"role": "user", "content": "Объясни квантовую физику."}
]

response = chat(dialog, temp=0.4)
print(response)

Данные обучения

19 927 тщательно отфильтрованных строк с диалогами:

154 306 (39.5%) — Английские
187 204 (48.0%) — Русские
48 528 (12.5%) — Смешанные

Источники:

limloop/characters_dialogs
IlyaGusev/gpt_roleplay_realm
tamohannes/llm-roleplay
radce/communication_dataset
databricks/databricks-dolly-15k
ch1eph/RuGeoBench
nyuuzyou/ruschatgpt-qa
0x22almostEvil/ru-riddles-377
0x22almostEvil/tatoeba-mt-qna-oa
Den4ikAI/ru_sberquad_long_answers
Vikhrmodels/GrandMaster-PRO-MAX
HuggingFaceH4/ultrachat_200k
OpenAssistant/oasst1
OpenAssistant/oasst2
PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT
Arketov/hieunguyenminh_roleplay-deduped-ShareGPT_ru
limloop/logic_duo
limloop/ru_en_linguistic_exchange
limloop/multi_engagement_roleplay_corpus

Все датасеты были дополнительно очищены и отфильтрованы для улучшения качества чат-взаимодействия.

Ограничения и предупреждения

🎭 Модель генерирует структурированные, но часто бессмысленные ответы
🔥 Рекомендуемая температура генерации: 0.1-0.6
⚠️ Может демонстрировать артефакты обучения (повторы, противоречия)
⚠️ Не предназначена для production-использования

Эта модель — как лесной ручей: вроде течёт куда-то, но куда именно — известно только белкам

A serpentine whisper in the bushes, carried by a gentle gust of wind

whiff-20M is a small experimental language model based on the Mamba2 architecture with 20.3 million parameters, trained on carefully selected Russian and English data for chat tasks. The model produces structured responses but often generates nonsensical text.

Technical Details

Architecture: Mamba2ForCausalLM from 🤗 Transformers
Parameters: 20.3M
Languages: Russian/English (bilingual)
Tokenizer: (custom mini-BPE tokenizer)
License: Apache 2.0

Model Configuration

Mamba2Config(
    vocab_size=8192,
    hidden_size=512,
    state_size=64,
    num_heads=12,
    num_hidden_layers=9,
    conv_kernel=4,
    expand=1.5,
    n_groups=2
)

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("limloop/whiff-mamba2-20M")
model = AutoModelForCausalLM.from_pretrained("limloop/whiff-mamba2-20M")

def chat(messages, temp=0.5):
    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
    
    outputs = model.generate(
        inputs,
        max_length=512,
        top_k=40,
        top_p=0.9,
        repetition_penalty=1.1,
        num_return_sequences=1,
        temperature=temp,
        do_sample=True,
        eos_token_id=1
    )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example
dialog = [
    {"role": "system", "content": "You are a wise elf."},
    {"role": "user", "content": "Explain quantum physics."}
]

response = chat(dialog, temp=0.4)
print(response)

Training Data

19 927 carefully filtered dialogue lines:

154 306 (39.5%) — English
187 204 (48.0%) — Russian
48 528 (12.5%) — Mixed

Sources:

limloop/characters_dialogs
IlyaGusev/gpt_roleplay_realm
tamohannes/llm-roleplay
radce/communication_dataset
databricks/databricks-dolly-15k
ch1eph/RuGeoBench
nyuuzyou/ruschatgpt-qa
0x22almostEvil/ru-riddles-377
0x22almostEvil/tatoeba-mt-qna-oa
Den4ikAI/ru_sberquad_long_answers
Vikhrmodels/GrandMaster-PRO-MAX
HuggingFaceH4/ultrachat_200k
OpenAssistant/oasst1
OpenAssistant/oasst2
PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT
Arketov/hieunguyenminh_roleplay-deduped-ShareGPT_ru
limloop/logic_duo
limloop/ru_en_linguistic_exchange
limloop/multi_engagement_roleplay_corpus

All datasets were additionally cleaned and filtered to improve chat interaction quality.

Limitations and Warnings

🎭 The model generates structured but often meaningless responses
🔥 Recommended generation temperature: 0.1-0.6
⚠️ May exhibit training artifacts (repetitions, contradictions)
⚠️ Not intended for production use

This model is like a forest stream: it seems to flow somewhere, but where exactly - only the squirrels know

Downloads last month: 15

Safetensors

Model size

20.3M params

Tensor type

F32

Model tree for limloop/whiff-mamba2-20M-v2

Base model

limloop/whiff-mamba2-20M

Finetuned

(1)

this model

Datasets used to train limloop/whiff-mamba2-20M-v2

Collection including limloop/whiff-mamba2-20M-v2

Mamba2

Collection

3 items • Updated Apr 4