Instructions to use v000000/NM-12B-Lyris-dev-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use v000000/NM-12B-Lyris-dev-3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="v000000/NM-12B-Lyris-dev-3")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("v000000/NM-12B-Lyris-dev-3")
model = AutoModelForCausalLM.from_pretrained("v000000/NM-12B-Lyris-dev-3")

Inference
Local Apps Settings

vLLM

How to use v000000/NM-12B-Lyris-dev-3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "v000000/NM-12B-Lyris-dev-3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "v000000/NM-12B-Lyris-dev-3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/v000000/NM-12B-Lyris-dev-3

SGLang

How to use v000000/NM-12B-Lyris-dev-3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "v000000/NM-12B-Lyris-dev-3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "v000000/NM-12B-Lyris-dev-3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "v000000/NM-12B-Lyris-dev-3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "v000000/NM-12B-Lyris-dev-3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use v000000/NM-12B-Lyris-dev-3 with Docker Model Runner:
```
docker model run hf.co/v000000/NM-12B-Lyris-dev-3
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Lyris-dev3-Mistral-Nemo-12B-2407

EXPERIMENTAL

attempt to fix Sao10k's Lyra-V3 prompt format and stop token >and boost smarts. with strategic LATCOS vector similarity merging

prototype, unfinished, dev3

Sao10K/MN-12B-Lyra-v1 Base
Sao10K/MN-12B-Lyra-v3 x2 Sequential PASS, order: 1, 3
unsloth/Mistral-Nemo-Instruct-2407 x2 Sequential PASS, order: 2, 4
with z0.0001 value

Prompt format:

Mistral Instruct

[INST] System Message [/INST]

[INST] Name: Let's get started. Please respond based on the information and instructions provided above. [/INST]

<s>[INST] Name: What is your favourite condiment? [/INST]
AssistantName: Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> 
[INST] Name: Do you have mayonnaise recipes? [/INST]

Downloads last month: 33

Safetensors

Model size

12B params

Tensor type

F16

Model tree for v000000/NM-12B-Lyris-dev-3

Sao10K/MN-12B-Lyra-v1

Sao10K/MN-12B-Lyra-v3

unsloth/Mistral-Nemo-Instruct-2407

Merge model

this model

Merges

3 models

Quantizations

3 models