Instructions to use ToastyPigeon/funny-nemo-embed-testing-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ToastyPigeon/funny-nemo-embed-testing-3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ToastyPigeon/funny-nemo-embed-testing-3") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ToastyPigeon/funny-nemo-embed-testing-3") model = AutoModelForCausalLM.from_pretrained("ToastyPigeon/funny-nemo-embed-testing-3") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ToastyPigeon/funny-nemo-embed-testing-3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ToastyPigeon/funny-nemo-embed-testing-3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ToastyPigeon/funny-nemo-embed-testing-3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ToastyPigeon/funny-nemo-embed-testing-3
- SGLang
How to use ToastyPigeon/funny-nemo-embed-testing-3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ToastyPigeon/funny-nemo-embed-testing-3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ToastyPigeon/funny-nemo-embed-testing-3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ToastyPigeon/funny-nemo-embed-testing-3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ToastyPigeon/funny-nemo-embed-testing-3", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ToastyPigeon/funny-nemo-embed-testing-3 with Docker Model Runner:
docker model run hf.co/ToastyPigeon/funny-nemo-embed-testing-3
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ToastyPigeon/funny-nemo-embed-testing-3")
model = AutoModelForCausalLM.from_pretrained("ToastyPigeon/funny-nemo-embed-testing-3")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Lyrebird
A creative writing model based on Mistral Nemo 12B to support co-writing and other related longform writing tasks.
Creator's Comments
This is pretty good, actually. Smarter than some other nemos I've tried and with decent samplers it's not very sloppy.
Working samplers: temp 1.25-1.5, min-p 0.02-0.05, rep pen 1.01, temp first. Feels like some prompts need higher or lower temp than others. Lower temps result in sloppy mistral-isms, higher temps tap into the lora training a bit more.
Chat template is theoretically ChatML because of the base models used in the merge. However the ChatML-Names preset in SillyTavern often gives better results, YMMV.
With ChatML-Names in particular this is good at copying the style of what's already in the chat history. So if your chat history is sloppy, this likely will be too (use XTC for a bit to break it up). If your chat history isn't sloppy, this is less likely to introduce any extra. Start a conversation off with text from a good model (or better yet, human-written text), and this should follow along easily.
Has the same pacing issues any Nemo model does when asked to compose a longform story from scratch via instruct, though better than some others. Seems like it's good at dialogue (though it has a bias towards country and/or british style English accents if unspecified), and is good at 'reading between the lines' for its size as well.
I did not include any erotica or other NSFW data in the LoRA training parts of this; however, Mag-Mell contains Magnum (and Chronos, which is trained on top of a rejected Magnum) so the capability is there if you need it (it just might be a bit Claude-slop-y as I haven't optimized this part for style).
Training
The two LoRAs on this were trained at 8k (nemo-kimi-lora) and 32k (nemo-books-lora) context. As you might guess, nemo-kimi-lora is trained on outputs from kimi-k2 (dataset is public on my profile), and nemo-books-lora is trained on a bunch of books.
merged
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the Linear merge method.
Models Merged
The following models were included in the merge:
- inflatebot/MN-12B-Mag-Mell-R1 + ToastyPigeon/nemo-kimi-lora
- migtissera/Tess-3-Mistral-Nemo-12B + ToastyPigeon/nemo-books-lora
Configuration
The following YAML configuration was used to produce this model:
models:
- model: inflatebot/MN-12B-Mag-Mell-R1+ToastyPigeon/nemo-kimi-lora
parameters:
weight: 0.5
- model: migtissera/Tess-3-Mistral-Nemo-12B+ToastyPigeon/nemo-books-lora
parameters:
weight: 0.5
merge_method: linear
dtype: bfloat16
tokenizer_source: migtissera/Tess-3-Mistral-Nemo-12B
- Downloads last month
- 7
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ToastyPigeon/funny-nemo-embed-testing-3") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)