Instructions to use LogicismTV/Toppy-M-7B-exl2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LogicismTV/Toppy-M-7B-exl2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LogicismTV/Toppy-M-7B-exl2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LogicismTV/Toppy-M-7B-exl2") model = AutoModelForCausalLM.from_pretrained("LogicismTV/Toppy-M-7B-exl2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LogicismTV/Toppy-M-7B-exl2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LogicismTV/Toppy-M-7B-exl2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LogicismTV/Toppy-M-7B-exl2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LogicismTV/Toppy-M-7B-exl2
- SGLang
How to use LogicismTV/Toppy-M-7B-exl2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LogicismTV/Toppy-M-7B-exl2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LogicismTV/Toppy-M-7B-exl2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LogicismTV/Toppy-M-7B-exl2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LogicismTV/Toppy-M-7B-exl2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LogicismTV/Toppy-M-7B-exl2 with Docker Model Runner:
docker model run hf.co/LogicismTV/Toppy-M-7B-exl2
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("LogicismTV/Toppy-M-7B-exl2")
model = AutoModelForCausalLM.from_pretrained("LogicismTV/Toppy-M-7B-exl2")Quick Links
Toppy M 7B - ExLlama V2
Original model: Toppy M 7B
Description
This is an EXL2 quantization of the Undi95's Toppy M 7B model.
Prompt template: Alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
Quantizations
| Bits Per Weight | Size |
|---|---|
| main (2.4bpw) | 2.29 GB |
| 3bpw | 2.78 GB |
| 3.5bpw | 3.19 GB |
| 4bpw | 3.59 GB |
| 4.5bpw | 4.00 GB |
| 5bpw | 4.41 GB |
| 6bpw | 5.22 GB |
| 8bpw | 6.84 GB |
Original model card: Carsten Kragelund's Chronomaid Storytelling 13B
Description
This repo contains fp16 files of Toppy-M-7B, a merge I have done with the new task_arithmetic merge method from mergekit.
This project was a request from BlueNipples : link
Models and loras used
- openchat/openchat_3.5
- NousResearch/Nous-Capybara-7B-V1.9
- HuggingFaceH4/zephyr-7b-beta
- lemonilia/AshhLimaRP-Mistral-7B
- Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b
- Undi95/Mistral-pippa-sharegpt-7b-qlora
The sauce
openchat/openchat_3.5
lemonilia/AshhLimaRP-Mistral-7B (LoRA) x 0.38
NousResearch/Nous-Capybara-7B-V1.9
Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b x 0.27
HuggingFaceH4/zephyr-7b-beta
Undi95/Mistral-pippa-sharegpt-7b-qlora x 0.38
merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
- model: mistralai/Mistral-7B-v0.1
- model: Undi95/zephyr-7b-beta-pippa-sharegpt
parameters:
weight: 0.42
- model: Undi95/Nous-Capybara-7B-V1.9-120-Days
parameters:
weight: 0.29
- model: Undi95/openchat_3.5-LimaRP-13B
parameters:
weight: 0.48
dtype: bfloat16
Prompt template: Alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
If you want to support me, you can here.
- Downloads last month
- 5
Model tree for LogicismTV/Toppy-M-7B-exl2
Base model
Undi95/Toppy-M-7B
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LogicismTV/Toppy-M-7B-exl2")