Instructions to use ToastyPigeon/funny-nemo-embed-testing-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ToastyPigeon/funny-nemo-embed-testing-3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ToastyPigeon/funny-nemo-embed-testing-3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ToastyPigeon/funny-nemo-embed-testing-3")
model = AutoModelForCausalLM.from_pretrained("ToastyPigeon/funny-nemo-embed-testing-3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ToastyPigeon/funny-nemo-embed-testing-3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ToastyPigeon/funny-nemo-embed-testing-3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ToastyPigeon/funny-nemo-embed-testing-3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ToastyPigeon/funny-nemo-embed-testing-3

SGLang

How to use ToastyPigeon/funny-nemo-embed-testing-3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ToastyPigeon/funny-nemo-embed-testing-3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ToastyPigeon/funny-nemo-embed-testing-3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ToastyPigeon/funny-nemo-embed-testing-3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ToastyPigeon/funny-nemo-embed-testing-3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ToastyPigeon/funny-nemo-embed-testing-3 with Docker Model Runner:
```
docker model run hf.co/ToastyPigeon/funny-nemo-embed-testing-3
```

Lyrebird

A creative writing model based on Mistral Nemo 12B to support co-writing and other related longform writing tasks.

Creator's Comments

This is pretty good, actually. Smarter than some other nemos I've tried and with decent samplers it's not very sloppy.

Working samplers: temp 1.25-1.5, min-p 0.02-0.05, rep pen 1.01, temp first. Feels like some prompts need higher or lower temp than others. Lower temps result in sloppy mistral-isms, higher temps tap into the lora training a bit more.

Chat template is theoretically ChatML because of the base models used in the merge. However the ChatML-Names preset in SillyTavern often gives better results, YMMV.

With ChatML-Names in particular this is good at copying the style of what's already in the chat history. So if your chat history is sloppy, this likely will be too (use XTC for a bit to break it up). If your chat history isn't sloppy, this is less likely to introduce any extra. Start a conversation off with text from a good model (or better yet, human-written text), and this should follow along easily.

Has the same pacing issues any Nemo model does when asked to compose a longform story from scratch via instruct, though better than some others. Seems like it's good at dialogue (though it has a bias towards country and/or british style English accents if unspecified), and is good at 'reading between the lines' for its size as well.

I did not include any erotica or other NSFW data in the LoRA training parts of this; however, Mag-Mell contains Magnum (and Chronos, which is trained on top of a rejected Magnum) so the capability is there if you need it (it just might be a bit Claude-slop-y as I haven't optimized this part for style).

Training

The two LoRAs on this were trained at 8k (nemo-kimi-lora) and 32k (nemo-books-lora) context. As you might guess, nemo-kimi-lora is trained on outputs from kimi-k2 (dataset is public on my profile), and nemo-books-lora is trained on a bunch of books.

merged

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the Linear merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: inflatebot/MN-12B-Mag-Mell-R1+ToastyPigeon/nemo-kimi-lora
    parameters:
      weight: 0.5
  - model: migtissera/Tess-3-Mistral-Nemo-12B+ToastyPigeon/nemo-books-lora
    parameters:
      weight: 0.5
merge_method: linear 
dtype: bfloat16
tokenizer_source: migtissera/Tess-3-Mistral-Nemo-12B