13B
Collection
11 items • Updated
How to use R136a1/MythoMax-L2-13B-exl2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="R136a1/MythoMax-L2-13B-exl2") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("R136a1/MythoMax-L2-13B-exl2")
model = AutoModelForCausalLM.from_pretrained("R136a1/MythoMax-L2-13B-exl2")How to use R136a1/MythoMax-L2-13B-exl2 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "R136a1/MythoMax-L2-13B-exl2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "R136a1/MythoMax-L2-13B-exl2",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/R136a1/MythoMax-L2-13B-exl2
How to use R136a1/MythoMax-L2-13B-exl2 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "R136a1/MythoMax-L2-13B-exl2" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "R136a1/MythoMax-L2-13B-exl2",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "R136a1/MythoMax-L2-13B-exl2" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "R136a1/MythoMax-L2-13B-exl2",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use R136a1/MythoMax-L2-13B-exl2 with Docker Model Runner:
docker model run hf.co/R136a1/MythoMax-L2-13B-exl2
EXL2 Quantization of Gryphe's MythoMax L2 13B.
Other quantized models are available from TheBloke: GGML - GPTQ - GGUF - AWQ
Base Perplexity : 5.7447
| Branch | bits | Perplexity | Description |
|---|---|---|---|
| 3bit | 3.73 | 5.8251 | Low bits quant while still good |
| 4bit | 4.33 | 5.7784 | can go 6K context on T4 GPU |
| main | 5.33 | 5.7427 | 4k Context on T4 GPU (recommended if you use Google Colab) |
| 6bit | 6.13 | 5.7347 | For those who want better quality and capable of running it |
Alpaca format:
### Instruction:
### Response: