Instructions to use WasamiKirua/L3-Odyssey-70B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WasamiKirua/L3-Odyssey-70B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WasamiKirua/L3-Odyssey-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WasamiKirua/L3-Odyssey-70B")
model = AutoModelForCausalLM.from_pretrained("WasamiKirua/L3-Odyssey-70B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use WasamiKirua/L3-Odyssey-70B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WasamiKirua/L3-Odyssey-70B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WasamiKirua/L3-Odyssey-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WasamiKirua/L3-Odyssey-70B

SGLang

How to use WasamiKirua/L3-Odyssey-70B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WasamiKirua/L3-Odyssey-70B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WasamiKirua/L3-Odyssey-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WasamiKirua/L3-Odyssey-70B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WasamiKirua/L3-Odyssey-70B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use WasamiKirua/L3-Odyssey-70B with Docker Model Runner:
```
docker model run hf.co/WasamiKirua/L3-Odyssey-70B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🌌 L3-Odyssey-70B (The Versatile Storyteller)

L3-Odyssey-70B is a state-of-the-art 70-billion parameter model designed for ultimate flexibility in high-quality Roleplay (RP) and immersive storytelling. It is a fusion of two Llama 3 legends: Sao10K/L3-70B-Euryale-v2.1 and Steelskull/L3.3-MS-Nevoria-70b.

The goal of this merge is to create a true narrative chameleon. By blending the unparalleled emotional intelligence and creative prose of Euryale with the superior logic, context handling, and instruction-following of Nevoria (based on Llama 3.3), L3-Odyssey adapts to any genre, tone, or writing style you request.

🤝 The "Odyssey" Symbiosis

We used the DARE-TIES merging method with a high density setting to ensure both models contribute their unique strengths without interfering with each other's core capabilities.

models:
  - model: Sao10K/L3-70B-Euryale-v2.1
    parameters:
      weight: 0.55 # The Heart: Storytelling engine, emotional depth, and rich prose.
  - model: Steelskull/L3.3-MS-Nevoria-70b
    parameters:
      weight: 0.45 # The Brain: Logic stabilizer, context handling, and strict coherence.

merge_method: dare_ties
base_model: Steelskull/L3.3-MS-Nevoria-70b
parameters:
  density: 0.75 # High density to retain vast general knowledge and vocabulary.
  weight: 1.0
dtype: bfloat16
tokenizer_source: base

💪 Key Strengths: The Generalist Powerhouse

Ultimate Flexibility: Excels at a wide range of tasks, from humorous, casual chats to dark, complex, novel-length storytelling. It adapts its vocabulary and tone to match your input.
Narrative Coherence: Thanks to the Llama 3.3 foundation (Nevoria), the model maintains incredibly stable plotlines and character consistency over very long contexts.
Emotional Intelligence (EQ): Inherits Euryale's famous ability to parse subtext, handle complex relationship dynamics, and deliver emotionally resonant dialogue.
Instruction Adherence: Highly responsive to System Prompts, World Info, and Character Cards. If you define a specific rule for your world, it will follow it.
Uncensored Freedom: Completely unrestricted. It will engage with any creative theme—dark, violent, or explicit—without refusal, focusing purely on your narrative goals.

🚀 Best Use Cases

Long-Form Collaborative Writing: Writing novels or extensive story arcs where the AI acts as a true writing partner.
Dynamic Roleplay: Engaging in scenarios where the tone can shift from action to drama to slice-of-life.
Text-Adventure Game Mastering: Acts as a capable GM that manages the world, items, and NPCs consistently while providing descriptive prose.

📈 Recommended Inference Settings

As a Llama 3 based model, standard Llama 3 settings apply. However, for storytelling, we recommend balancing creativity with focus:

Temperature: 1.0 - 1.15 (depending on how creative you want it).
Min-P: 0.05 - 0.1 (essential for stability at higher temperatures).
Repetition Penalty: 1.05 - 1.1 (standard Lllama 3 range).
Context: Native context is 8k, but Lllama 3.3 can often push further depending on the quantization method (EXL2/GGUF).

Disclaimer

L3-Odyssey-70B is an experimental generalist model. It is completely uncensored and is intended for mature users engaged in creative writing and roleplay. It has no safety filters and may generate content unsuitable for some audiences. User discretion is advised.